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Abstract 

Given an n-length input signal x, it is well known that its Discrete Fourier Transform (DFT), X, can be 
computed in 0(n log n) complexity using a Fast Fourier Transform (FFT). If the spectrum X is exactly /c-sparse 
(where k << n), can we do better? We show that asymptotically in k and n, when k is sub-linear in n (precisely, 
k (xn^ where < S < 1), and the support of the non-zero DFT coefficients is uniformly random, we can exploit 
this sparsity in two fundamental ways (i) sample complexity: we need only M = rk deterministically chosen 
samples of the input signal x (where r < 4 when < 6 < 0.99); and (ii) computational complexity: we can 
reliably compute the DFT X using 0{klogk) operations, where the constants in the big Oh are small and are 
related to the constants involved in computing a small number of DFTs of length approximately equal to the 
sparsity parameter k. Our algorithm succeeds with high probability, with the probability of failure vanishing to 
zero asymptotically in the number of samples acquired, M. 

Our approach is based on filterless subsampling of the input signal x using a small set of carefully chosen 
uniform subsampling patterns guided by the Chinese Remainder Theorem (CRT). The idea is to cleverly exploit, 
rather than avoid, the resulting aliasing artifacts induced by this subsampling. Specifically, our subsampling 
operation on x is designed to create aliasing patterns on the spectrum X that "look like" parity-check constraints 
of good erasure-correcting sparse-graph codes. We show how computing the sparse DFT X is equivalent to 
decoding of these sparse-graph codes. These codes further allow for fast peeling-style decoding, similar to that 
of fountain codes. The resulting DFT computation is low in both sample complexity and decoding complexity. 
We accordingly dub our algorithm the FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm. We 
analytically connect our proposed CRT-based aliasing framework to random sparse-graph codes, and analyze the 
performance of our algorithm using density evolution techniques from coding theory. Additionally our approach 
uncovers a new deterministic and efficient class of compressive sensing measurement matrices as an alternative 
to the popular choice of random matrices. Based on the qualitative nature of the subsampling patterns needed, our 
analysis is decomposed into two regimes: i) "very-sparse" (0 < J < 1/3), where M < 2.5k, and ii) "less-sparse" 
(1/3 <S < 1), where M < 4k for S < 0.99. 

While our theory is asymptotic, our simulation results reveal that in practice our algorithm works even for 
moderate values of k and n, e.g. k = 200 and n = 4080. Further, when k = 300, and n is approximately 3.8 
million, our algorithm achieves computational savings by a factor of more than 6000, and savings in the number 
of input samples by a factor of more than 3900 over the standard FFT. We analyze the 1-D DFT here, but our 
approach can be extended in a straightforward manner to multi-dimensional DFTs. Further, while our analysis is 
for the exactly sparse case, our approach can be extended to be robust to the more general approximately sparse 
and noisy measurement cases as well, albeit at the cost of increased sample and computational complexity. We 



provide extensive simulation results in Section VIII that corroborate our theoretical findings, and validate the 
empirical performance of the FFAST algorithm for a wide variety of ID and 2D DFT settings for signals having 
an exactly or approximately sparse Fourier spectrum. 



I. Introduction 



Spectral analysis using the Discrete Fourier Transform (DFT) has been of universal importance in 
engineering and scientific applications for a long time. The Fast Fourier Transform (FFT) is the fastest 



(a) 2-D Fourier transform of a differ- 
ence image between two consecutive ax- 
ial slices of "Brain" MRI image. 



(b) Exactly k — 3250 sparse n — 
195 X 308 dimensional differential Brain 
image. 



(c) Perfectly reconstructed image us- 
ing the FFAST algorithm from M = 
2.74k = 8910 samples of the 2-D 
Fourier transform (Fig. |l(a)| ). 



Fig. 1: An application of our FFAST algorithm to reconstruct the differentially sparse "Brain" image in an Magnetic 
Resonance Imaging (MRI) setting. In MRI, recall that samples are acquired in the Fourier domain, and the challenge is to 
speed up acquisition time by minimizing the number of samples needed to reconstruct the desired spatial domain image, a) 
Shows the spectrum of a difference image between two consecutive axial slices of the Brain. In MRI, samples are acquired 
in this domain, b) Shows the spatial domain differential Brain image of size n = 195 x 308 with exactly k = 3250 non- 
zero pixels (sparsity is approximately 5%). The FFAST algorithm acquires less than 15% of the Fourier samples (sample 
complexity less than 3k), and processes them using a low-complexity (0{klogk)) FFAST algorithm to perfectly, up to 
machine precision, reconstruct it as shown in part (c). 



known way to compute the DFT of an arbitrary n-length signal, and has a computational complexitjj^ 
of O(nlogn). Many applications of interest involve signals, e.g. relating to audio, image, and video 
data, seismic signals, biomedical signals, financial data, social graph data, cognitive radio applications, 
surveillance data, satellite imagery, etc., which have a sparse Fourier spectrum. In such cases, a small 
subset of the spectral components typically contain most or all of the signal energy, with most spectral 
components being either zero or negligibly small. If the n-length DFT, X, of x, is fc-sparse, where 
k « n, can we do better in terms of both sample and computational complexity of computing the 
sparse DFT? 

We answer this question affirmatively and show how to exploit the spectral domain sparsity to reduce 
the sample and computational complexity. As a motivating example, we have applied our proposed 
FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm to reconstructing the differentially 
sparse "Brain" image (see Fig. |l(b)| ) acquired in an Magnetic Resonance Imaging (MRI) application. In 
MRI, recall that samples are acquired in the Fourier domain, and the challenge is to speed up acquisition 
time by minimizing the number of samples needed to reconstruct the desired spatial domain image. We 
have an n = 195 x 308 difference image of two consecutive axial slices of the Brain, as shown in 
Fig. 1(b) , which is k = 3250-sparse (i.e., approximately 5%-sparsity). Our algorithm synthesizes a big 
195 X 308 size 2L^-DFT using a smaller set of short 2L^-DFTs of size 15 x 77,44 x 39 and 65 x 28 
(two DFTs of each size) and successfully reconstructs the original image perfectly up to a machine 
precision, as shown in Fig. 1(c) The FFAST algorithm acquires only 14.84% of Fourier samples (see 
Fig. |l(a)| ), i.e. with a sample complexity of M = 2.74/c. 

This application leads to some interesting observations. First, while our theory is asymptotic in k and 
n, and targets the regime where k is sub-linear in n, in practice it applies even for moderate values where 
the sparsity is about 5%, as in the above MRI image example. Secondly, while our analysis is based 
on a uniformly random sparsity model, our algorithm seems applicable more generally to real-world 



^Recall that a single variable function /(x) is said to be 0{g{x)), if for a sufficiently large x the function is bounded above by 

|p(x)|, i.e., linicc^oo \f{x)\ < c\g{x)\ for some constant c. Similarly, f{x) = Q{g{x)) if linicc^oo \f{x)\ > c\g{x)\ and f{x) = o{g{x)) 
if the growth rate of as x ^ oo, is negligible as compared to that of |^(x)|, i.e. linix^oo \f{x)\/\g{x)\ = 0. 
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"clustered" sparsity settings, such as the MRI image of Fig |l(b)[ 

The focus of this paper is however on 1-D signals having an exactly- sparse DFT. More precisely, our 
analysis focuses on the set of 1-D n-length signals x whose n-point DFT, X, has at most k non-zero 
coefficients that are uniformly randomly distributed. Our motivation for this focused study is three- 
fold: (i) to provide conceptual clarity of our proposed approach in a noiseless setting; (ii) to present 
our deterministic subsampling front-end measurement subsystem as a viable alternative to the class of 
randomized measurement matrices popular in the compressive sensing literature yj, |2J; and (iii) to 
explore the fundamental limits on both sample complexity and computational complexity for an exact- 
sparse DFT, which is of intellectual interest. This will allow us to address questions like: "Can we 
exploit the sparse structure of the DFT to be computationally more efficient than the O(nlogn) FFT 
which applies more generally?" and "Can we make the sample and computational complexity dependent 
only on kT 

The key insights derived from analyzing the exactly sparse signal model apply more broadly to the 
setting of approximately sparse signals (or where the measurements are further corrupted by noise). In 
Section [Vni] (see Figs. [T4l[l5(g)| ) we empirically validate the noise robustness of our proposed algorithm, 
under a Gaussian noise model for the tail coefficients, for settings involving an approximately sparse 
spectrum. However, unlike the exactly sparse case, for the noisy setting, the sample and the computational 
complexity of the FFAST algorithm are not 4/c and 0{k\ogk) any more. This is not surprising; as 
is well known in the literature, the measurment complexity for the noisy case is lower bounded by 
Vt{k\og{n/k)) [[3|, Q. While we provide empirical evidence of the noise robustness of our FFAST 
algorithm, we do not provide an analytical characterization of sample or computational complexity for 
such a noisy setting in this paper. Such an analysis is part of our ongoing and future work. 

Our main theoretical result is that asymptotically in k and n, when k is sub-linear in n (precisely, 
k (X where < < 1), we can exploit the Fourier domain sparsity in two fundamental ways (i) 
sample complexity: we need to keep no more than rk samples of x (where r < 4 when < 5 < 0.99 1^; 
and (ii) computational complexity: we can reliably compute the DFT X using 0{klogk) operations, 
where the constants in the big Oh are small and are related to the constants involved in computing a small 
number of DFTs of length approximately equal to the sparsity parameter k. Our algorithm succeeds 
with high probability, which we will use henceforth to mean, with probability at least 1 — 0(1/M), 
where M is the sample complexity of the FFAST algorithm. 

We emphasize the following caveats. First, as mentioned, our analytical results are probabilistic, 
and work asymptotically in k and n, with a success probability that approaches 1 asymptotically. This 
contrasts the 0{n log n) FFT algorithm which works deterministically for all values of k and n. Secondly, 
in our 0{klogk) computational complexity, we have not accounted for any I/O costs of reading and 
writing to/from memory, and assume the presence of random-access-memory. Thirdly, as mentioned, 
we assume a uniformly random model for the support of the non-zero DFT coefficients. Lastly, we 
require the signal length n to be a product of a few (typically 3 to 9) distinct prime^ 

In effect, our algorithm trades off sample and computational complexity for asymptotically zero 
probability of failure guarantees in a non-adversarial sparsity setting, and is applicable whenever k is 
sub-linear in n (i.e. k is o(n)), but is obviously most attractive when k is much smaller than n. As a 
concrete example, when k = 300, and n = 2^3^5^ ^ 3.8 x 10^, our algorithm achieves computational 
savings by a factor of more than 6000, and savings in the number of input samples by a factor of 
more than 3900 over the standard FFT (see [5] for computational complexity of a prime factor FFT 



^Our analysis applies for any value of < ^ < 1. The sample complexity M will be no more than rk for some oversampling ratio r 
that increases as 6 approaches 1 (the linear regime). For example, when 6 = 0.999, r is no more than 5, and for 6 = 0.9999, r is no 
more than 6. See Fig. [3] and Table |ll| in Section |lll| The range from < ^ < 0.99 essentially covers the entire sub-linear sparsity regime 
of practical interest, and is therefore used as the default regime in the title of this paper. 

^This is not a major restriction as in many problems of interest, the choice of n is available to the system designer, and choosing n to 
be a power of 2 is often invoked only to take advantage of the readily-available radix-2 FFT algorithms. 
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Fig. 2: Schematic block diagram of the FFAST architecture. The n-point input signal x is uniformly subsampled by a 
carefully chosen set of patterns, guided by the Chinese-Remainder-Theorem, to obtain dD sub-streams. Each of the d 
stages has D sub-streams corresponding to the D delays. For the exactly sparse setting, D = 2. A sub-stream is of length 
approximately equal to the sparsity k, resulting in an aggregate number of samples M = D Yli=o fi — ^ small 

constant r. Next, the (short) DFTs, of each of the resulting sub-streams are computed using an efficient FFT algorithm of 
choice. The big n-point DFT X is then synthesized from the smaller DFTs using the peeling-like FFAST decoder. Although 
in this paper we focus on the low-complexity peeling-like FFAST decoder, in principle one can replace it with any decoder 
of choice that can synthesize the DFT X from the smaller DFTs, e.g., a convex program like -minimization. This can be 
particularly useful in settings when the DFT X is approximately sparse or when the sampled data is corrupted by noise. 
Analysis of such an architecture is beyond the scope of this paper. 



algorithm). This can be a significant advantage in many existing apphcations, as well as enable new 
classes of applications not thought practical so far. 

Main Idea: At a high level, our algorithm uses a small and structured set of uniform subsam- 
pling operations applied directly on the input signal. It is well known from basic sampling theory 
that subsampling without anti-alias filtering creates spectral aliasing that destroys the original signal 
spectrum. Our key idea is to exploit rather than avoid this aliasing. This is done by recognizing that a 
carefully designed subsampling operation will induce spectral aliasing artifacts that look like the parity 
constraints of good erasure-correcting codes that additionally have a fast peeling-decoding algorithm 
(see Section |IV-B| for more details). As a result, both the sample complexity and the computational 
complexity of our algorithm are low. 

Our main inspiration is drawn from the class of sparse graph-codes for erasure channels studied in 
coding theory, e.g., Low-Density-Parity-Check (LDPC) codes [[6|, fountain codes Q, [[8|, verification 
codes [|9|, etc. Why? These codes have two important properties that we would like to inherit: (a) they 
have very low computational complexity (iterative peeling-based) decoder; and (b) they are near-capacity 
achieving for the erasure channel. The first property bestows the desired low computational complexity, 
while the second property ensures that the sample complexity of the algorithm is near-optimal. 

But how do we achieve this goal? We cannot induce any arbitrary code in the spectral domain 
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at our will as we can control only the subsampling operation on the time-domain signal. The key 
idea is to design subsampling patterns, guided by the Chinese-Remainder-Theorem (CRT), that create 
the desired code-like aliasing patterns. As we will describe in Section |V} the subsampling patterns are 
based on relatively co-prime integers for the very-sparse regime (sparsity-index < < 1/3). When the 
spectrum is less-sparse (1/3 < 5 < 1), the subsampling patterns are a bit more complicated, comprising 
of "cyclically- shifted" overlapping co-prime integers. See Section |VI] for details. 

Our approach is summarized in Fig. [2j As a concrete example, if the signal x is of length n ^ 10^ and 
its DFT X has k = 200 non-zero coefficients, then the FFAST algorithm first generates 6 (d = 3, D = 2) 
uniformly sampled sub-streams of the input data, each of size ^ 100 (fi ^ 100, i = 0, 1, 2), and then 
synthesizes X from the short DFTs of these 6 sub-streams. Due to the linear-time (in k) decoding 
complexity of the peeling-like FFAST decoder, our main computational bottleneck is in taking the 
small constant number of short (linear in k) DFTs. The precise number and the size of the short DFTs 



is quantified in Section III For the entire range of practical interest of sub-linear sparsity (i.e. k oc 
where < 5 < 0.99), the overall sample complexity of the FFAST algorithm is no more than 4/c, 
and the computational complexity is 0{klogk) with small constants in the big-Oh. This is particularly 
gratifying because both the sample complexity and the computational complexity depend only on the 
sparsity parameter fc, which is sub-linear in n. 

Related Work: The problem of computing a sparse discrete Fourier transform of a signal is related 
to the rich literature of frequency estimation [[TO)-[[T^ in statistical signal processing as well as to 
compressive-sensing [T|, Q. In frequency estimation, it is assumed that a signal consists of k complex 
exponentials in the presence of noise and the focus is on 'super-resolution' spectral estimation techniques 
based on well-studied statistical methods like MUSIC and ESPRIT fTOll-fTS']. The methods used are 
based on subspace decomposition principles. In contrast, we take a different approach combining tools 
from coding theory, number theory, graph theory and signal processing. In compressive sensing, the 
objective is to reliably reconstruct the sparse signal from as few measurements as possible, using a fast 
recovery technique. The bulk of this literature concentrates on random linear measurements, followed 
by either convex programming or greedy pursuit reconstruction algorithms Q, JT4| |, [[T5j. An alternative 
approach, in the context of sampling a continuous time signal with a finite rate of innovation is explored 
in [|16||-[|19||. Unlike the compressive sensing problem, where the resources to be optimized are the 
number of measurement^ (each measurement can further be a linear combination of multiple input 
samples) and the recovery cost, in our problem, we want to minimize the number of input samples (i.e., 
sample complexity) processed by an algorithm in addition to the recovery cost. 

At a higher level though, despite some key differences in our approach to the problem of computing 
a sparse DFT, our problem is indeed closely related to the spectral estimation and compressive sensing 
literature, and our approach is naturally inspired by this, and draws from the rich set of tools offered 
by this literature. 

A number of previous works [[20|-p4|| have addressed the problem of computing a 1-D DFT of 
a discrete-time signal that has a sparse Fourier spectrum, in sub-linear sample and time complexity. 
Most of these algorithms achieve a sub-linear time performance by first isolating the non-zero DFT 
coefficients into different bins, using specific filters or windows that have 'good' (concentrated) support 
in both, time and frequency. The non-zero DFT coefficients are then recovered iteratively, one at a time. 
The filters or windows used for the binning operation are typically of length 0{k\og{n)). As a result, 
the sample and computational complexity is typically 0{k\og{n)) or more. Moreover the constants 
involved in the big-Oh notation can be large, e.g., the empirical evaluation of presented in [ [25 | 
shows that for n = 2^^ and k = 7000, the number of samples required are M ?^ 2^^ = 300/c which 



^Consider a compressive sensing problem with a measurement matrix A, i.e., y = Ax, where y is a measurement vector and x is the 
input signal. Then, the sample complexity is equal to the number of non-zero columns of A and the measurement complexity is equal to 
the number of non-zero rows of A. 
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is 75 times more than the sample complexity 4/c of the FFAST algorithnj^ The work of [22] provides 
an excellent tutorial on some of the key ideas used by most sub-linear time sparse FFT algorithms in 
the literature. While we were writing this paper, we became aware of a recent work [ |26| |, in which 
the authors consider the problem of computing a noisy as well as an exactly- sparse 2-D DFT of size 
^Jn X ^Jn signal. For an exactly sparse signal, and when k = ^Jn, the algorithm in [|26| uses 0{k) 
samples to compute the 2-D DFT of the signal in 0{k log(/c)) time with a constant probability of failure 
(that is controllable but that does not appear go to zero asymptotically). In [27], the author proposes a 
sub-linear time algorithm with a sample complexity of 0(/clog^n) or 0(/c^log^n) and computational 
complexity of 0{k\o^ n) or 0{k^ log^ n) to compute a sparse DFT, with high probability or zero-error 
respectively. The algorithm in [27] exploits the Chinese-Remainder-Theorem, along with O {poly {log n)) 
number of subsampling patterns to identify the locations of the non-zero DFT coefficients. In contrast, 
the FFAST algorithm exploits the CRT to induce 'good' sparse-graph codes using a small constant 
number of subsampling patterns and computes the sparse DFT with a vanishing probability of failure. 

In summary, to the best of our knowledge, the FFAST algorithm is the first that we are aware of to 
compute an exactly /c-sparse n-point DFT that has all of the following features: 

• it has 0{k) sample complexity and 0{klogk) computational complexity; 

• it covers the entire sub-linear regime (/c oc n^, < 5 < 1); 

• it has a probability of failure that vanishes to zero asymptotically; 

• it features the novel use of the Chinese Remainder Theorem to guide the design of a small 
deterministic set of uniform subsampling patterns that induce good sparse-graph channel codes. 

The rest of the paper is organized as follows. Section [II] states the problem and introduces the 
notations. Section [nl] presents our main results, and states the main theorem. Section [TV] exemplifies the 
mapping of computing the DFT to decoding over an appropriate sparse-graph code, and also provides 
review of the relevant background information on sampling, aliasing, and decoding on bipartite graphs. 
Sections [V] and [Vl| provide the performance analysis of the FFAST algorithm for the very -sparse and the 
less-sparse regimes respectively, and can be bridged by a reader more interested in the practical aspects 
of our algorithm. Section jVIIIj provides extensive simulation results that corroborate our theoretical 
findings, and validate the empirical performance of the FFAST algorithm for a wide variety of ID and 
2D DFT settings for signals having an exactly or approximately sparse Fourier spectrum. 



II. Problem formulation, notations and preliminaries 
A. Problem formulation 

Consider an n-length discrete-time signal x that is the sum of << n complex exponentials, i.e., its 
n-length discrete Fourier transform has k non-zero coefficients: 

k-i 

x[p] = J2 a,e2--^^/^ p = 0, 1, . . . , n - 1, (1) 

q=0 

where the discrete frequencies cj^ G {0, 1, . . . , n — 1} and the amplitudes G C, for g = 0, 1, . . . , — 1. 
We consider the problem of identifying the k unknown frequencies uOq and the corresponding complex 
amplitudes from the time domain samples x. A straightforward solution is to compute an n-length 
DFT, X, using a standard FFT algorithm [28], and locate the k non-zero coefficients. Such an algorithm 
uses n samples and O(nlogn) computations. When the DFT X is known to be exactly /c-sparse and 
k « n, computing all the n DFT coefficients seems redundant. 



^As mentioned earlier, the FR\ST algorithm requires the length of the signal n to be a product of a few distinct primes. Hence, the 
comparison is for an equivalent n ^ 2^^ and k — 7000. 
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In this paper, we address the problem of designing an algorithm, to compute the fc- sparse n-point 
DFT of X for the asymptotic regime of k and n, when the support of the non-zero DFT coefficients is 
uniformly random. We would like the algorithm to have the following features: 

• it takes as few input samples M of x as possible. 

• it has a low computational cost that is a function of only the number of samples M. 

• it is applicable for the entire sub-linear regime, i.e., for all < < 1, where k = 0{n^). 

• it computes the DFT X with a probability of failure vanishing to as M becomes large. 

In the next section, we setup the notations and provide definitions of the important parameters used 
in the rest of the paper. 

B. Notation and preliminaries 



Notation 


Description 


n 


Ambient dimension of the signal x. 


k 


Number of non-zero coefficients in the DFT X. 
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Sparsity-index: /c oc n*^ , < ^ < 1. 


M 


Sample complexity: Number of samples of x used by the FFAST algorithm to compute the DFT X. 


r M/k 


Oversampling ratio: Number of samples per non-zero DFT coefficient. 


d 


Number of stages in the "sub-sampling front end" of the FFAST architecture. 


D 


Number of sub-streams of the input signal x generated in each stage 
of the "sub-sampling front end" of the FFAST architecture. 


u 


Number of samples of x per sub-stream, in the i*^ stage of the "sub-sampling front end" 
of the FFAST architecture; M = YltZl Dfi. 



TABLE I: Glossary of important notations and definitions used in the rest of the paper. The last three parameters are related 
to the "sub-sampling front end" of the proposed FFAST architecture (see Figure |4] for details). 

Modulo-operator: For integers a, A^, we use (a) at to denote the operation, a mod A^, i.e., {a)N = a 

mod A^. 

We now describe the Chinese-Remainder-Theorem (CRT) which plays an important role in our 
proposed FFAST architecture as well as in the FFAST decoder. 



Theorem 1 (Chinese-Remainder- Theorem p8]|). Suppose no,ni, . . . ,n^_i are pairwise co-prime posi- 
tive integers and N = Yi^Zo ^i- Then, every integer 'a' between and N — 1 is uniquely represented 
by the sequence tq, ri, . . . , Td-i of its remainders modulo no, ... , rid-i respectively and vice-versa. 

Further, given a sequence of remainders ro, ri, . . . , r^-i, where < < n^. Gauss's algorithm can 
be used to find an integer 'a\ such that. 



[o)n, =ri for z = 0, 1, . . . 1. 



(2) 



For example, consider the following pairwise co-prime integers no = 3, ni = 4 and n2 = 5. Then, 
given a sequence of remainders ro = 2, ri = 2, r2 = 3, there exists a unique integer 'a\ such that. 



2 = a mod 3 

2 = a mod 4 

3 = a mod 5 



(3) 



It is easy to verify that a = 38 satisfies the congruencies in (|3]). Further, a = 38 is a unique integer 
modulo = nonin2 = 60 that solves (|3]). 
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sparsity index 6 = log k/log n 
(for large enough n) 

Fig. 3: Phase transition plot for the FFAST algorithm. The plot shows the relation between the oversampling ratio r = M/k, 
and the sparsity index S for < S < 0.99, where k oc . The FFAST algorithm successfully computes the /c-sparse n-point 
DFT X of the desired n-point signal x with high probability, as long as the number of samples M is above the threshold 
given in the plot. Note that for nearly the entire sub-linear regime of practical interest, e.g. k < n^-^^, the oversampling 
ratio r < 4. For asymptotic values of k, the oversampling ratio r = 2dr], where d is the number of stages in the FFAST 
architecture and rj is the average number of samples per sub-stream normalized by the number of non-zero coefficients k. 
In Section |V-D we show that the number of stages d used in the FFAST architecture increases as S approaches 1. The 
above phase transition plot is then obtained by using the constructions in Section |V-D| and the values of dr] from Table llV 



in Section V-Cl 



III. Main Results 

We propose a novel FFAST algorithm to compute the (exactly) fc-sparse n-point DFT, X, of an n- 
point signal x. In this paper, an n-length input signal x is defined to have a fc-sparse DFT X, if X has 
at most k non-zero arbitrary-valued coefficients, whose locations are uniformly randomly distributed 
in {0, 1, . . . , n — 1}. The FFAST algorithm computes the fc-sparse n-point DFT with high probability, 
using as few as 0{k) samples of x and 0{k log k) computations. The following theorem states the main 
result of the paper. 

Theorem 2. For any value of the sparsity index < 5 < 1, and n large enough, there exists a FFAST 
algorithm with parameters (n, fc, M), where k = 0{n^), such that the FFAST algorithm can compute a 
k-sparse DFT X of an n-length input x with the following properties: 

1) Sample complexity: The algorithm needs M samples of^, where M can be as small as rk, where 
the oversampling ratio r > 1 is a small constant that depends on the sparsity index 6; 

2) Computational complexity: The computational complexity is cMlog(M), where the constant c 
is small and related to the constants associated with computing a few approximately- M -length 

fftE 

3) Probability of success: The probability that the algorithm will correctly compute the k-sparse 
DFT X is at least 1- 0(1/ M), 



^Note that when M — rk, the computational complexity is 0{k\ogk). 



Proof: We prove the theorem in three parts. In Section [V} we analyze the performance of the 



FFAST algorithm for the very-sparse regime (0 < 5 < 1/3), and in Section VI we analyze the less 



sparse regime 1/3 < 5 < 1. Lastly, in Section VII we analyze the sample and computational complexity 
of the FFAST algorithm. ■ 

Remark 3. [Oversampling ratio r] The minimum achievable oversampling ratio r of the sample 
complexity M = rk depends on the number of stages d used in the FFAST architecture. The number of 
stages d, in turn, is a function of the sparsity index 5 (recall k oc n^), and increases as 6 ^ 1 (i.e., as 
the number of the non-zero coefficients, k, approach the linear regime in n). In Sections [V| and \Vl\ we 
show how the number of stages d increases as 5 approaches 1. Table ^provides some example values 
of r and d for different values of the sparsity index 5. In Fig. [3] we plot r as a function of 5 for a 
sparsity regime of practical interest, i.e., < 6 < 0.99. 
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1/3 


2/3 


0.99 


0.999 


0.9999 


d 


3 


6 


8 


11 


14 


r 


2.45 


3.14 


3.74 


4.64 


5.51 



TABLE II: The table shows the number of subsampling stages d used in the FFAST architecture, and the corresponding 
values of the oversampling ratio r (for different values of the sparsity index S). 



IV. Fast Fourier transform using decoding on sparse-graphs 

In this section we describe our (deterministic) sub-samphng "front-end" architecture as well as the 
associated FFAST decoding 'back-end" algorithm for computing the fc-sparse n-point DFT, and we will 
connect this to the framework of decoding over sparse-graph codes. Before we delve into the details, we 
first review two background concepts from signal processing. We will use simple examples. Consider 
a 20-point discrete-time signal x = (x[0], . . . , x[19]) whose 20-point DFT X is 5-sparse. Let the 5 
non-zero DFT coefficients of x be X[l] = 1, X[3] = 4, X[5] = 1, X[10] = 3 and X[13] = 7. 
1) Aliasing: If a signal is subsampled in the time domain, its frequency components mix together, 
i.e., alias, in a pattern that depends on the sampling procedure. For example, consider uniform 
subsampling of x by a factor of 5 (see Fig. |4]) to get x^ = (x[0], x[5], x[10], x[15]). Then, the 
4-point DFT of x^ if| 

X,[0] = X[0]+X[4]+X[8]+X[12]+X[16] =0 

X,[l] = X[l]+X[5]+X[9]+X[13]+X[17] =9 

X,[2] = X[2]+X[6]+X[10]+X[14]+X[18] =3 

Xs[3] = X[3]+X[7]+X[ll]+X[15]+X[19] =4 

In general if the sampling period is N (we assume that N divides n) then 

Xs[i]= J2 ^t-^']' (4) 

where the notation {i)n/N denotes the operation i mod n/N. 



^Subsampling is usually followed by a scaling gain equal to the subsampling factor, e.g., when a signal is subsampled by 5 a scaling 
gain of 5 is applied to all the samples. In this paper we assume this to be an inherent operation of subsampling, and we will not carry 
around this gain explicitly. 
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ix[0],x[l],...,x[19]) 



Z > 



15 



15 



|4 



(a;[0],a;[5],a;[10],a;[15]) 



4-DFT 



{x[l],x[6],x[n],x[16]) 



4-DFT 



(x[0],a:[4],x[8],a:[12],x[16]) 



5-DFT 



(x[l],x[5],x[9],x[13],x[17]) 



5-DFT 



(X,[0],...,X,[3]) 



(Z,[0],...,Z,[4]) 



Fig. 4: A toy-example of the front end sub-sampling architecture of the FFAST algorithm. The input to the FFAST architecture 
is a 20-point discrete-time signal x = (x[0], . . . , x[19]). The input signal and its unit delayed version are first subsampled 
by 5 to obtain two sub-streams (D = 2), each of length /o = 4. A 4-point DFT of each sub-stream is then computed to 
obtain the observations Similarly, downsampling by 4 followed by a 5-point DFT provides the second set of 

/i = 5 observations Note that the number of samples per sub-stream /o and /i in the two different stages 

(d = 2) are pairwise co-prime and are factors of n = 20. In general, the choice of subsampling factors depend on the 
sparsity index S. For the very-sparse regime, (0 < (5 < 1/3), the subsampling factors are such that the number of samples 
per sub-stream in the different stages of the FFAST architecture are relatively co-prime factors of n (see Section |V] for 
details). For the less-sparse regime, (1/3 < (5 < 1), the number of samples per sub-stream in different stages have a more 



complicated "cyclically-shifted" overlapping co-prime structure (see Section VI for details). 



2) Shift in time: From elementary Discrete Fourier Transform properties, we know that a circular 
shift in the time domain results in a phase shift in the frequency domain. Consider a circularly 
shifted sequence x^^^ obtained from x as x^^^fi] = x[{i + 1)^]. The DFT coefficients of the shifted 
sequence are given as, X^^^'lj] = ujiX[j], where u)n = exp(27rz/n) is an n^^ root of unity. In 
general a circular shift of uq results in X^^^^fj] = uj^^^X[j]. 

A. Computing a sparse DFT is equivalent to decoding on a sparse-graph 

Using the above 20-point example, we explain how to compute the /c-sparse n-point DFT of a signal, 
using decoding over sparse graphs. Let, the input signal x be processed through the front-end of the 

FFAST architecture, shown in Fig. |4| to obtain the observations -^s[-]) and (Z^f-], 

For now, we focus only on the paths that produce the outputs Xs[-] and Zs[-]. Figures 5(a)|5(c) 



and |5(e)| show the magnitude plots of the discrete-time signal x and its subsampled versions, while 



Figures [5(b)l |5(d)| and |5(f)| show the magnitude plots of the corresponding DFTs of appropriate lengths. 
Note that the observation Xs[l] is a combination of three non-zero DFT coefficients of x, we refer 
to such observations as "multi-tons". Likewise the observations that consists of exactly one non-zero 
DFT coefficient of x are referred to as "singletons", e.g. Xs[2], while the ones that consists of all the 
zero DFT coefficients are called "zero-tons", e.g., ^^[O] or Zs[2]. Consider first that a "genie" informs 
us which observations are singletons, as well as, the location and value of the contributing non-zero 
DFT coefficient. We will later explain (in Section |IV-C| ) how to get rid of the genie. Then, the FFAST 
algorithm can recover the DFT coefficients X[1],X[3] and X[10]. After, subtracting the contribution 
of these uncovered DFT coefficients from the observations, more singleton observations are created. 
Thus, with the help of the genie, the FFAST algorithm can potentially recover all the 5 non-zero DFT 
coefficients of x via an iterative peeling procedure. 

An alternative and more convenient way to analyze this peeling procedure is to first represent the 
observations in the form of a bi-partite graph (see Fig. |6]) where the left (variable) nodes correspond 
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X[13] = 7 



JY[3]=4 



X[W] = 3 



1 



1 



X[l] = 1 

A. 



X[5] = 1 

T 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 IB 16 17 18 19 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 



(a) Magnitude plot of the n = 20 length (b) Magnitude plot of the 20-point DFT X. 
discrete-time signal x. 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

(c) Magnitude plot of the subsampled input 
signal corresponding to the 1** sub- stream of 
stage in Fig. |4] 



X[l]+X[5]+X[13] =9 



X[10] - 



X[3] = 4 



Xs[2] X,[3] 



(d) Magnitude plot of the 4-point 
DFT of the subsampled input signal 
corresponding to the 1** sub-stream 
of stage 0. 



X[3]+X[13] = 11 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

(e) Magnitude plot of the subsampled input 
signal corresponding to the 1** sub-stream of 
stage 1 in Fig. [4] 



X[5]+X[10] =4 

X[l] = 1 



Zs[0] Z,[l] Zs[2] Zs[3] Z,[4] 

(f) Magnitude plot of the 5-point 
DFT of the subsampled input signal 
corresponding to the 1** sub-stream 
of stage 1. 



Fig. 5: Frequency aliasing induced by the FFAST architecture for the 20-point toy example signal x. The left hand side sub- 
figures (a), (c), (e) are the magnitude plots of the time-domain signals x and the 1^^ sub-streams in each of the two stages 
respectively. The sub-figures (6), (d), (/) on right, provide the magnitude plots of the DFT of respective time-domain signals 
on the left. Note that the observations Xs[2],X5[3] and Zs[l] are singletons (marked by 'unfilled circles'), ^^[l], Zs[0] and 
Zs[3] are multi-tons, and X5[0],Z5[2] and ^s[4] are zero-tons. 



to the non-zero DFT coefficients of x (unknown and to be computed) and the right (check) nodes 
correspond to the observations generated through the FFAST architecture of Fig. |4} A check node with 
an edge degree 1 corresponds to a singleton observation, while the ones having the edge degrees equal to 
and greater than 1 correspond to zero-ton and multi-ton observations respectively. Iterative-decoding 
on the graph given in Fig. [6] can be used to compute the DFT of x from the observations. 

Thus, the FFAST architecture has transformed the problem of computing the DFT of x into that of 
decoding over a sparse bi-partite graph of Fig. |6} 
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□ (X,[0],X,[0]) 
(X,[1],X,[1]) 
(X,[2],X,[2]) 
(X,[3],X,[3]) 



h fo 



(zMZs[o]) 

iZsll],Z.,[l]) 

□ {ZMZ.A2]) \- f, = 5 
(Z,[3],Z,[3]) 

□ (Z,[4],Z44]) _ 



Fig. 6: A 2-left regular degree bi-partite graph representing the relation between the unknown DFT coefficients and the 
observations obtained through the FFAST architecture shown in Fig. |4j for the 20-point example signal x. Variable (left) 
nodes correspond to the DFT coefficients and the check (right) nodes are the observations. The observation at each check 
node is a 2-dimensional complex-valued vector. 



Erasure Channel 


Sparse DFT 


1. Explicitly designed sparse-graph code. 


1. Implicitly designed sparse-graph code induced by 
sub-sampling. 


2. n — k correctly received packets. 


2. n — k zero DFT coefficients. 


3. A;-erased packets. 


3. k unknown non-zero DFT coefficients 


4. Peeling-decoder recovers the values of 
the erased packets using 'singleton' check 
nodes. The locations of which packets are 
erased are known. 


4. Peeling-decoder recovers both the values and the locations of 
the non-zero DFT coefficients using 'singleton' check nodes. The 
locations of the non-zero DFT coefficients are not known. This 
results in a 2x cost (i.e., D = 2) in the sample complexity. 


5. Codes based on regular-degree bipartite 
graphs are near-capacity-achieving. More 
efficient, capacity-achieving 
irregular-degree bipartite graph codes can 
be designed. 


5. The FFAST architecture based on uniform subsampling can 
induce only left-regular degree bi-partite graphs. 



TABLE III: Comparison between decoding over a sparse-graph code for a packet erasure channel and computing a sparse 
DFT using the FFAST architecture. 



B. Connection to coding for packet erasure channels 

The problem of decoding over sparse bi-partite graphs has been well studied in the coding theory 
literature. In this section we draw an analogy between decoding over sparse-graph codes for a packet 
erasure channel and decoding over bi-partite graphs induced by the FFAST architecture. 

Consider an (n, n — m) packet erasure code. Each n-length codeword consists of {n — m) information 
packets and m parity packets. Suppose that the code is used over an erasure channel that uniformly at 
random drops some k number of packets. A bi-partite graph representation of the parity check matrix 



of the code, after subtracting the contribution of the correctly received packets, is shown in Fig. |7(a) 
In Table |ni| we provide comparison between decoding over bi-partite graphs of Fig. |7(a)| and Fig. |7(b) 
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symbols/packets 



Parity 
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Non-zero 
DFT coefficients 



Aliased 
Frequency bins 



(a) Bi-partite graph, after subtracting the 
contribution of the correctly received n — k 
packets, corresponding to a parity check 
matrix of a sparse-graph code designed for 
an erasure channel. 



(b) Bi-partite graph representing the con- 
nections (aliasing) among the non-zero DFT 
coefficients due to subsampling, and the ob- 
servations obtained by the FFAST decoder. 



Fig. 7: Comparison between the bi-partite graphs corresponding to the parity check matrix of a sparse-graph code for an 
erasure channel and a graph induced by the FFAST architecture. 



Thus, the problem of decoding over bi-partite graphs induced by the FFAST architecture is closely 
related to the decoding of sparse-graph codes for an erasure-channel. We use this analogy: a) to design 
a set of uniform sub-sampling patterns that induce 'good' left-regular degree sparse-graph codes; and 
b) to formally connect our proposed Chinese-Remainder-Theorem based aliasing framework to random 
sparse-graph codes constructed using a balls-and-bins model, and analyze the convergence behavior of 
our algorithm using density evolution techniques. 



C. FFAST Decoder 

In Section |IV-A| we described an iterative peeling-decoder to compute the sparse DFT X for a toy 
example with the help of a "genie" that informed the decoder if a check node is a zero-ton, singleton or 
multi-ton, as well as, the location of the corresponding non-zero DFT coefficient in case of a singleton 
check node. In this section, we show how to get rid of the "genie" by using the additional observations 
Xs['] and Zs[-]. The resulting iterative decoding algorithm is referred to as the 'FFAST-decoder'. First, 
we make the following important observation: if a check node is a singleton, then by computing the phase 
of the ratio of its observations, e.g., ^Z{Xs[2]/Xs[2]) = 10, we can identify the index of the non-zero 
DFT coefficient, 10 in this example, connected to this check node and the first observation, Xs[2] = 3, 
provides the value of the non-zero DFT coefficient. In general, as shown in [29], identifying the support 
or the location of a non-zero DFT coefficient connected to a singleton check node from multiple (two in 
this example) observations is equivalent to the problem of estimating the frequency of a single sinusoid 
from its time domain samples. The problem of estimating a single sinusoid from noiseless or noisy time 
domain samples goes back to the signal processing literature from several decades ago, and the ratio test 
discussed above is a special case of well-known techniques used in [10] and [ |30J for a similar problem. 



13 



In particular, this has been used in [29| to show that by having twcj^ observations per check node, one 
can identify whether the check node is a zero-ton, singleton or multi-ton with overwhelmingly high 
reliability. We will therefore assume henceforth that by processing the observations at each check node, 
we can reliably identify if it is a zero-ton, a singleton or a multi-ton. Additionally if a check node is a 
singleton, then we know the value as well as the location of the corresponding DFT coefficient. 
The FFAST decoder then repeats the following steps (also see pseudocode in Algorithm [T]): 

1) Select all the edges in the graph with right degree 1 (edges connected to singletons). 

2) Remove these edges from the graph as well as the associated left and right nodes. 

3) Remove all the other edges that were connected to the left nodes removed in step-2. When a 
neighboring edge of any right node is removed, its contribution is subtracted from that check node. 

Decoding is successful if, at the end, all the edges have been removed from the graph. It is easy to verify 
that performing the peeling procedure on the example graph of Fig. [6] results in successful decoding, 
with the coefficients being uncovered in the following possible order: Xio, X3, Xi, X5, X13. 

One can also think of the FFAST decoder as a message-passing algorithm. The check nodes pass 
a message!^ saying either "no-information" (in the case of a zero-ton or multi-ton), or the value and 
identity of the variable node (in the case of a singleton). The variable nodes in turn just forward this 
message to the other check nodes. One round of messages passed back and forth between variable nodes 
and check nodes is called an iteration of the decoder. 

We have illustrated how computing a sparse DFT can be translated into a decoding problem over 
sparse-graphs. Clearly the success of this fast decoder will depend on the properties of the resulting 
graph. How to get "good" graphs that will result in a low complexity as well as a fast and reliable 
decoding will be addressed in the next section. 

V. Performance analysis of the FFAST algorithm for the very-sparse 

{k (xn\ < 5 < 1/3) REGIME 

In the previous section, we showed that the problem of computing a fc- sparse n-point DFT of a 
signal can be transformed into a problem of decoding over sparse bipartite graphs using the FFAST 
architecture. In this section, we describe how to choose a set of uniform sub-sampling patterns, guided 



by the CRT, to induce a good sparse-graph code. As shown in Section [rV-B[ the FFAST decoding process 
is closely related to the decoding procedure on sparse-graph codes designed for erasure channels. From 
the coding theory literature, we know that there exist several sparse-graph code constructions that are 
low-complexity and capacity-achieving for the erasure channels. The catch for us is that we are not 
at liberty to use any arbitrary bi-partite graph, but can choose only those graphs that can be induced 
through our proposed subsampling. Next, we show that a deterministic bi-partite graph construction 
based on the Chinese-Remainder-Theorem (CRT), in conjunction with a uniformly random support of 
the non-zero DFT coefficients, creates sparse-graph codes that: a) have all the good properties that are 
sufficient for reliable decoding; and b) can be induced using the FFAST subsampling architecture. 

We describe two ensembles of bi-partite graphs, the first based on a "balls-and-bins" model, and the 
second using a deterministic construction based on the CRT. Later, in Lemma |4} we show that the two 
ensembles are equivalent. 

Let = {/o, . . . , /d-i}, be a set of pairwise co-prime integers and m = Yl^Zo f^- explained 
in Table [1} the integers //s are the number of samples per sub-stream in the d stages of the FFAST 
architecture (see Fig. |2]). The integers //s are approximately equal and we use F to denote this value. 

^For the case when the signal has an approximately sparse DFT, more observations per check node will be needed to robustly identify 
whether the check node is a zero-ton, singleton or multi-ton. I.e., in Fig. |2] D will no longer be 2 as in the exactly sparse case. 

^In this paper we focus on a low-complexity message passing algorithm where the messages from the check nodes to the variable nodes 
are hard-decision: either "singleton" or "no-information". One can implement "softer" and more complex variants of this message-passing 
algorithm, especially when dealing with the noisy case. 
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Algorithm 1 FFAST Algorithm 



1: Inputs: 

• A discrete time signal x of length n, whose n-point DFT X has at most k non-zero coefficients, 
where k (x n^, 

• The subsampling parameters of the FFAST architecture (see Fig. [2]): 

- the number of stages d. 

- the number of samples per sub-stream in each of the d stages = {/o, /i, . . . , fd-i}- 
are chosen as discussed in Sections |V] and [Vll 

2: Output: An estimate of the fc-sparse n-point DFT X. 

3: Subsampling: The input signal x and its unit delayed version, are uniformly subsampled by 
{^/ fi}i=o^ to get = 2 sub-streams of the sampled input data per stage (see Fig. [2]). The total 
number of samples M = 2 Yltlo ft- 

4: Compute short DFTs: 

• Compute an /^-length DFT of each of the 2 sub-streams in the d stages for i = 0, 1, . . . , (i — 1. 

• This results in a bi-partite graph, with k variable nodes on the left and m = ^fjo check 
nodes on the right, where each check node has two observations (see Fig. |6]for an example). 

5: FFAST Decoding: Set the initial estimate of the n-point DFT X = 0. Let £ denote the number of 

iterations performed by the FFAST decoder. 
6: for each iteration do 
7: for each check node do 

8: Let (2/ [0], ?/[!]) be the two observations of the check node. 

9: if y[0] =y[l]=0 then 
10: the check node is a zero-ton, 

11: else 

12: singleton-test: If the location estimate loc = ^ — ^?/[0]), is an integer between and 

n — 1, then the check node is a singleton. 
13: Peeling: If the check node is a singleton set X[/oc] = ^[0]. Subtract the contribution of 

(peel-off) X[/oc] from all the neighboring check nodes. 
14: else 

15: the check node is a multi-ton. 

16: end if 
17: end for 
18: end for 



More precisely, fi = F + 0(1) forz = 0,...,(i— 1, where F is an asymptotically large number. The 
0(1) perturbation term in each fi is used to obtain a set of co-prime integers near F. Note, that the 
total number of samples used by the FFAST algorithm is M = 2dF + 0(1) (see Fig. |2]). In this section, 
we use F = rjk, for some constant ry, which results in an order optimal sample complexity M = 0(/c), 
but the constructions and results trivially extend to the case when M > 0{k). 
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A. Randomized construction based on the ''Balls-and-Bins'' model: C\{J^^m) 

We construct a bi-partite graph with k variable nodes on the left and m check nodes on the right. The 
bipartite graphs that we consider in this paper have each variable node v connected to exactly d check 
nodes, while the edge degree of the check nodes is variable, i.e., left-regular degree bi-partite graphs. 
An example graph from an ensemble (J', m), for J" = {4, 5}, d = 2, = 5 and m = 9 is provided 
in Fig. [6j More generally, the ensemble Cl{J^,m) of (i-left regular degree bipartite graphs constructed 
using a "balls-and-bins" model is defined as follows. Partition the set of m check nodes into d subsets 
with the i^^ subset having fi check nodes. For each variable node, choose one neighboring check node 
in each of the d subsets, uniformly at random. The corresponding rf-left regular degree bipartite graph 
is then defined by connecting the variable nodes with their neighboring check nodes by an undirected 
edge. 

An edge e in the graph is represented as a pair of nodes e = c}, where v and c are the variable 
and check nodes incident on the edge e. By a directed edge e we mean an ordered pair [v, c) or (c, v) 
corresponding to the edge e = {v,c}. A path in the graph is a directed sequence of directed edges 
el, . . . , such that, if = {ui, u[), then the u[ = uij^i for i = 1, . . . , t — 1. The length of the path is 
the number of directed edges in it, and we say that the path connecting ui to Ut starts from ui and 
ends at Ut. 

1) Directed Neighborhood: The directed neighborhood of depth £ of e = (^,c), denoted by A/g, 
is defined as the induced subgraph containing all the edges and nodes on paths el , . . . , starting at 
node V such that el ^ e. An example of a directed neighborhood of depth ^ = 2 is given in Fig. [8j If 
the induced sub-graph corresponding to the directed neighborhood A/^ is a tree then we say that the 
neighborhood of the edge e is tree-like. 



B. Ensemble of bipartite graphs constructed using the Chinese-Remainder-Theorem (CRT): C|(J^, n) 

Let n = Y^iZl fi. The ensemble C^J^, n) of rf-left regular degree bipartite graphs, with k variable 
nodes and m check nodes, is defined as follows. Partition the set of m check nodes into d subsets with 
the i^^ subset having fi check nodes (see Fig. [6] for an example). Consider a set X of k integers, where 
each element of the set X is chosen uniformly at random, with replacement, between and n — 1. Assign 
the k integers from the set X to the k variable nodes in an arbitrary order. Label the check nodes in the 
set i from to — 1 for alH = 0, . . . , — L A rf-left regular degree bi-partite graph with k variable 
nodes and m check nodes, is then obtained by connecting a variable node with an associated integer v 
to a check node {v)f. in the set for i = 0, . . . , — L The ensemble C2 (J-*, n) is a collection of all 
the (i-left regular degree bipartite graphs induced by all possible sets X. A uniformly random choice of 
integers in the set X, implies that all the graphs in the ensemble CKJ', n) occur with equal probability. 

Note that the modulo rule used to generate a graph in the ensemble C2{J^,n) is same as the one 
used in equation (|4]) of Section [IV) Thus, the FFAST architecture of Fig. |4] described in Section |IV 



generates graphs from the CRT ensemble C^J-'^ n), where the indices X of the k variable nodes are the 
locations of the non-zero DFT coefficient^ of the signal x. 

Lemma 4. The ensemble of bipartite graphs C\{J^^m) is identical to the ensemble C2{J-'^n). 

Proof: It is trivial to see that C|(J^, n) C (J^, m). Next we show the reverse. Consider a graph 
Qi G Cl{T^m). Suppose, a variable node v G Qi is connected to the check nodes {a^}^~o^. Then, using 
the CRT, one can find a unique integer q between and n — 1 such that (g)/. = Vi = 0, . . . , — L 
Thus, for every graph Qi G (J-*, m), there exists a set X of k integers, that will result in an identical 
graph using the CRT based construction. Hence, Cl{J^^m) C2(J^, n). ■ 



The integers in the set T are chosen uniformly at random, with replacement, between and n — 1. A set X with repeated elements 
then corresponds to a signal with fewer than k non-zero DFT coefficients. 
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Fig. 8: Directed neighborhood of depth 2 of an edge e = (v,c). The dashed Hnes correspond to nodes/edges removed at 
the end of iteration j. The edge between v and c can be potentially removed at iteration j + 1 as one of the check nodes 
is a singleton (it has no more variable nodes remaining at the end of iteration j). 



Next, we analyze the performance of the peehng-decoder over a random choice of a graph from 
the ensemble C\{J^,m), which, using the Lemma |4| along with the fact that all the graphs in both 
the ensembles occur with equal probability, provides a lower bound on the probability of successful 
decoding of the FFAST decoder over graphs in the ensemble C2{J^,n). 

C. Performance analysis of the peeling-decoder on graphs from the ensemble C\{J^^m) 

In this section we analyze the probability of success of the peeling-decoder, over a randomly chosen 
graph from the ensemble Cl{J^,m), after a fixed number of iterations I. Our analysis follows exactly 
the arguments in |31] and p2J . Thus, one may be tempted to take the results from [ [3T| "off-the-shelf. 
However, we choose here to provide a detailed analysis for two reasons. First, our graph construction 



in the ensemble Cf(J', m) is different from that used in [31 1, which results in some fairly important 
differences in the analysis, such as the expansion properties of the graphs, thus warranting an independent 
analysis. Secondly we want to make the paper more self-contained and complete. 

We now provide a brief outline of the proof elements highlighting the main technical components 
needed to show that the peeling-decoder decodes all the non-zero DFT coefficients with high probability. 
• Density evolution: We analyze the performance of the message-passing algorithm, over a typical 
graph from the ensemble, for I iterations. First, we assume that a local neighborhood of depth 21 of 
every edge in a typical graph in the ensemble is tree-like, i.e., cycle-free. Under this assumption, all 
the messages between variable nodes and the check nodes, in the first ^ rounds of the algorithm, are 
independent. Using this independence assumption, we derive a recursive equation that represents 
the expected evolution of the number of singletons uncovered at each round for this typical graph. 



Convergence to the cycle-free, case: Using a Doob martingale as in pT| , we show that a random 
graph from the ensemble, chosen as per nature's choice of the non-zero DFT coefficients, behaves 
like a "typical" graph, i.e., 2^-depth neighborhood of most of the edges in the graph is cycle- 
free, with high probability. This proves that for a random graph in (J^, m), the peeling-decoder 
decodes all but an arbitrarily small fraction of the variable nodes with high probability in a constant 
number of iterations, I. 

Completing the decoding using the graph expansion property: We first show that if a graph is an 



"expander" (as will be defined later in Section V-C3), and the peeling-decoder successfully decodes 



all but a small fraction of the non-zero DFT coefficients, then it decodes all the non-zero DFT 
coefficients successfully. Next, we show that a random graph from the ensemble Cf(J', m) is an 
expander with high probability. 
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1 ) Density evolution for local tree-like view; In this section we assume that a local neighborhood 
of depth 2i of every edge in a graph in the ensemble is tree-like. Next, we define the edge-degree 
distribution polynomials of the bipartite graphs in the ensemble as X{a) = Xli^i and p{a) = 

Xli^i Pi^^~^^ where (resp. pi) denotes the probability that an edge of the graph is connected to a left 
(resp. right) node of degree i. Thus for the ensemble Cl{T,m), constructed using the balls-and-bins 
procedure, \{a) = a^"^ by construction. Further, as shown in Appendix |a| the edge degree distribution 
p{a) = exp(-(l - a)/r]). 

Let pj denote the probability that an edge is present after round j of the peeling-decoder, then po = 1. 
Under the tree-like assumption, the probability Pj+i, is given as, 

p,+i = A(l-p(l-p,)) J = 0,1,...,^-!. (5) 

Equation ([5]) can be understood as follows (also see Fig. [8]): the tree-like assumption implies that, up to 
iteration messages on different edges are independent. Thus, the total probability, that at iteration j + l, 
a variable node v is decoded due to a particular check node is given by p{l—pj) = Xli^i Pi{^ ~PjY~^ 
and similarly the total probability that none of the neighboring check nodes decode the variable node 
V is pj+i = A(l — p(l — Pi)). Specializing equation ([5]) for the edge degree distributions of C^{T,m) 
we get, 

P,+i= (l-e-^)'"\ Vj- = 0,1,...,^-1 (6) 

where = 1- The evolution process of ([6]) asymptotically (in the number of iterations I) converges to 
for appropriate choice of the parameter 77, e.g., see Table |IV} 



d 


2 


3 


4 


5 


6 


7 


8 


9 




1.0000 


0.4073 


0.3237 


0.2850 


0.2616 


0.2456 


0.2336 


0.2244 


drj 


2.0000 


1.2219 


1.2948 


1.4250 


1.5696 


1.7192 


1.8688 


2.0196 



TABLE IV: Minimum value of 77, the average number of bins or check nodes per stage per variable node, required for the 
density evolution of ([6]) to converge asymptotically. The threshold value of r] depends on the number of stages d. Although, 
the minimum threshold of r] decreases with increasing d, the total number dr]k) of bins or check nodes required for the 
convergence of ([6]) is a not a monotonic function in d. 



2) Convergence to cycle-free case.* In Lemma [5] we show that: a) the expected behavior over all 
the graphs in the ensemble (J^, m) converges to that of a cycle-free case, and b) with exponentially 
high probability, almost all the edges in a random graph in the ensemble have a tree-like neighborhood, 
while the proportion of the edges that are not decoded after £ iterations of the peeling-decoder is tightly 
concentrated around pi, as defined in ([6]). 

Lemma 5 (Convergence to Cycle-free case). Over the probability space of all graphs Cl{J^,m), let 
Z be the total number of edges that are not decoded after £ (an arbitrarily large but fixed) iterations 
of the peeling-decoder over a randomly chosen graph. Further let p£ be as given in the recursion (|6]). 
Then there exist constants (5 and 7 such that for any ei > and sufficiently large k we have 

(a) E[Z]<kdpe + ^; (7) 

(6) Pr{\Z- kdpt\ > kde) < 2e-^''\ (8) 

where j > is a constant. 

Proof: Please see Appendix |D] ■ 
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3) Successful Decoding using Expansion; In the previous section we showed that with high 
probabihty, i.e., with probabihty approaching 1 exponentially in k, the peeling-decoder decodes all but 
an arbitrarily small fraction of variable nodes. In this section, we show how to complete the decoding if 
the graph is a "good-expander". Our problem requires the following definition of an "expander- graph", 
which is somewhat different from conventional notions of an expander-graph in literature, e.g., edge 
expander, vertex expander or spectral expander graphs. 

Definition 6 (Expander graph). A d-left regular degree bipartite graph constructed as described in 



Section V-A is called an {a^/3^d) expander, if for all subsets S, of variable nodes, of size at most ak. 



there exists a right neighborhood of S, i.e., Ni{S), that satisfies |A^^(S')| > (5\S\for some i = 0, . . . , d—1. 

In the following lemma, we show that if a graph is an expander, and if the peeling-decoder successfully 
decodes all but a small fraction of the non-zero DFT coefficients, then it decodes all the non-zero DFT 
coefficients successfully. 

Lemma 7. Consider a graph from the ensemble (J^, m), with \T\ = d, that is an (a, 1/2, rf) expander 
for some a > 0. If the peeling-decoder over this graph succeeds in decoding all but at most ak variable 
nodes, then it decodes all the variable nodes. 

Proof: See Appendix [B] ■ 
In Lemma [s} we show that most of the graphs in the ensemble (J^, m) are expanders. 

Lemma 8. Consider a random graph from the ensemble Cl{J^^m), where d>3. Then, all the subsets 
S of the variable nodes, of the graph, satisfy mSix{\Ni{S)\}'^lQ > \S\/2, 

a) with probability at least 1 — e-e^iog(m//c)^ ^^^^ of size \S\ = ak, for small enough a > and 
some 6 > 0. 

b) with probability at least 1 — 0{l/m), for sets of size \S\ = o{k). 

Proof: See Appendix [C] ■ 
The condition > 3 is a necessary condition for part (6) of Lemma [8} This can be seen as follows. 
Consider a random graph from the ensemble Cf(J', m), where IJ^I = rf. If any two variable nodes in 
the graph have the same set of d neighboring check nodes, then the expander condition, for the set S 
consisting of these two variable nodes, will not be satisfied. In a bi-partite graph from the ensemble 
(J^, m), there are a total of 0(/c^) distinct sets of d check nodes. Each of the k variable nodes chooses 
a set of d check nodes, uniformly at random and with replacement, from the total of 0{k'^) sets. If 
we draw k integers uniformly at random between to n — 1, the probability Pr{k]n) that at least two 
numbers are the same is given by, 

Pr{k]n) ^ l-e-^'/2^. (9) 



This is also known as the birthday paradox or the birthday problem in literature p3| |. For a graph from 
the ensemble (J^, m), we have n = 0{k^). Hence, if the number of stages d <2, there is a constant 
probability that there exists a pair of variable nodes that share the same neighboring check nodes, in 
both the stages, thus violating the expander condition. 

Theorem 9. The peeling-decoder over a random graph from the ensemble C\{J^^ m), where d > 3 and 
F r]k: 

a) successfully uncovers all the variable nodes with probability at least 1 — 0(1 /m); 

b) successfully uncovers all but a vanishingly small fraction, i.e., o{k), of the variable nodes with 
probability at least 1 — e-«^^iog(m//c)^ y-^^ ^^^^ e > 0, 



for an appropriate choice of the constant r] as per Table IV 
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Proof: Let Z be the number of edges not decoded by the peeling-decoder in £ (large but fixed 
constant) iterations. Then, from recursion ([6]) and Lemma^for an appropriate choice of the constant rj 
(as per Table IV), Z < ak, for an arbitrarily small constant a > 0, with probability at least 1 — e~^^i^. 
The result then follows from Lemmas [S| and ■ 



D. Performance of the FFAST-decoder over graphs in C2 {J^, n} for k oc n^, for {0 < 5 < 1/3). 

In earlier sections, we discussed at length a simple iterative peeling-like decoding over random graphs 
constructed using a balls-and-bins model as described in Section |V-A[ In this section we connect this 



oc n 



and 



discussion to the problem of computing a /c-sparse n-point DFT of a signal, where k 
{0<S< 1/3). 

Consider a = 3 stage FFAST architecture, i.e., ^^^i/o, /i, /2}. Let F = r]k, where r] is an 



IV 



in Section 



V-Cl), Til = rii^o/^ ^ ^ some 



appropriately chosen constant (see Table 

integer V > I, i.e., k oc n^, and < 5 < lj3. As V increases, 5 approaches 0. 

For 7^ = 1, the CRT guarantees that every integer j between and n — 1 is uniquely represented by a 
triplet ((j) /q , {j) , (i)/2). However for 7^ > 1, such a statement does not hold. For example, when V = 2, 
for every triplet (ro, ri, there are exactly two numbers between and n — 1 that have (ro, ri, as 
remainders modulo z = 0, 1, 2. Hence, the only difference between V = I and P > 1 in terms of the 
resulting bi-partite graph is whether or not multiple variable nodes (the non-zero DFT coefficients) have 
identical neighboring check nodes in all the 3 stages. Recall, in the balls-and-bins ensemble Ci{T,m) 



described in Section |V-A[ we did allow for multiple variable nodes to share an identical set of check 
nodes. Despite this, we showed in Theorem |9] that the peeling-decoder successfully decodes the values 
of all the variable nodes with high probability. As a result, the proof of Theorem [2] for the very-sparse 
regime then follows from Lemma |4] and Theorem [9| □ 



VI. Performance analysis of the FFAST algorithm for the less-sparse 

(k (xn\ 1/3 < 5 < 1) REGIME 

In the previous section, we analyzed the performance of the FFAST algorithm for the very-sparse 
regime (k oc n^, < 5 < 1/3). Recall that for the very-sparse regime, the integers in the set = 
{/o, . . . i.e., the number of samples per sub-stream in the d different stages, are pairwise co- 

prime. For the less-sparse regime (k oc n^, 1/3 < 5 < 1) the relation between the integers //s is bit 
more involved. So, for ease of exposition in this section, we describe the achievable FFAST construction 
for a special case of /c oc n^/^, i.e., 5 = 2/3, which generalizes to the regimes 5 = 1 — l/rf, for integer 



values of (i > 3, through an induction argument. Later in Section |VI-C| we show how to achieve the 
intermediate values of S. To summarize, our approach for the less-sparse regime is as follows. 

• First we analyze the performance of the FFAST algorithm for 5 = 2/3. Then, in Section VI-B, we 
provide a brief sketch of how to generalize the FFAST architecture to any S = 1 — 1/d, for integer 
values of d> 3. This covers the range of values of (^ = 2/3, 3/4, 



In Section |VI-C[ we show how to achieve the intermediate values of 5, thus covering the entire 
range of the sparsity index 1/3 < 6 < 1. 



A. Less-sparse regime of 5 = 2/3 

Let {Vo,Vi,V2}, be a set of pairwise co-prime integers such that Vi = F + 0(1), i = 0, 1,2, for 
some large integer F. Set A; oc and n = IlLo^^' i-^- ^ ^ ^1^^' /o = ^0^1, fi = V1V2 
and /2 = 
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fo = VoVi 



fi = V1V2 





□ 








□ 




- T2 


la □ 


□ 




/2 = V2V0 



Fig. 9: A bi-partite graph with k variable nodes and m = ^^^q fi check nodes, constructed using a balls-and-bins model. 
The check nodes in each of the 3 sets are arranged in a matrix format, e.g., the fo check nodes in the set are arranged in 
Vo rows and Vi columns. A check node in row tq and column ri in the set 0, is indexed by a pair (ro, ri) and so on and 
so forth for all the other check nodes. Each variable node chooses a triplet (ro,ri,r2), where is between and Vi — I 
uniformly at random. A 3-regular degree bi-partite graph is then constructed by connecting a variable node with a triplet 
(ro,ri,r2) to the check nodes (ro,ri), (ri,r2) and (r2,ro) in the three sets of the check nodes respectively. 



1 ) Balls-and-Bins construction; We construct a bi-partite graph with k variable nodes on the left 
and m = Yl'i=o fi^ check nodes on the right (see Fig. ^ using balls-and-bins model as follows. Partition 
the m check nodes into 3 sets/stages containing /o, /i and /2 check nodes respectively. The check nodes 
in each of the 3 sets are arranged in a matrix format as shown in Fig. [9| e.g., fo check nodes in the 
set are arranged in Vo rows and Vi columns. A check node in row ro and column ri in the set 0, 
is indexed by a pair (ro, ri) and so on and so forth for all the other check nodes. Each variable node 
chooses a triplet (ro,ri,r2), where r^ is between and Vi — I uniformly at random. The triplets are 
chosen with replacement and independently across all k variable nodes. A 3-regular degree bi-partite 
graph with k variable nodes and m check nodes is then constructed by connecting a variable node with 
a triplet (ro,ri,r2) to the check nodes (ro, ri), (ri, r2) and (r2,ro) in the three sets of check nodes 
respectively, e.g., see Fig. [9| 

2) Connection to the CRT based bi-partite graphs induced by the FFAST architecture. Each 
variable node is associated with an integer v between and n — 1 (location of the DFT coefficient). 
As a result of the subsampling and computing a smaller DFTs in the FFAST architecture (see Fig [2]), a 
variable node with an index v is connected to the check nodes (v) , (v) and (v) in the 3 stages, in the 
resulting aliased bi-partite graph. The CRT implies that v is uniquely represented by a triplet (ro, ri, r2), 
where r^ = {v)p.. Also, {{v)f.)p. = (v)^. = r^, for all i = 0, 1, 2. Thus, the FFAST architecture induces 
a 3-regular degree bi-partite graph with k variable nodes and m check nodes, where a variable node with 
an associated triplet (ro, ri, r2) is connected to the check nodes (ro, ri), (ri, r2) and (r2, ro) in the three 
sets respectively. Further, a uniformly random model for the support ^ of a non-zero DFT coefficient, 
corresponds to choosing the triplet (ro, ri, r2) uniformly at random. Thus, the CRT based construction, 
induced by the FFAST architecture, is equivalent to the balls-and-bins construction discussed in the 
Section IVEAll 
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Following the outline of the proof of Theorem |2] (provided in Section [V]), one can show the following, 
for the balls-and-bins construction of Section IVI-Ali 

1) Density evolution for the cycle-free case: Assuming a local tree-like neighborhood derive a recursive 
equation (similar to equation [6]) representing the expected evolution of the number of singletons 
uncovered at each round for a "typical" graph from the ensemble. 

2) Convergence to the cycle-free case: Using a Doob martingale show an equivalent of Lemma |5] for 
the less-sparse regime, where the number of check nodes in the 3 different stages /o, /i and /2 are 
not pairwise co-prime. 

3) Completing the decoding using the graph expansion property: A random graph from the ensemble 
is a good expander with high probability. Hence, if the FFAST decoder successfully decodes all 
but a constant fraction of variable nodes, it decodes all the variable nodes. 

The analysis of the first two items for the less-sparse regime is similar in spirit to the one in Section [V} 
and will be skipped here. However, the analysis of the third item will be described here as there are some 
key differences, mainly arising from the nature of the overlapping co-prime number of check nodes in 
the bi-partite graphs for the less-sparse regime in contrast to the very-sparse regime. In Section [V} for 
the very-sparse regime we showed (in Lemma [8]) that the bottleneck failure event is not being able to 
decode all the DFT coefficients. In this section, we analyze this bottleneck failure event for the case of 
the less-sparse regime. In particular, we show that if the FFAST decoder has successfully decoded all 
but a small constant number of DFT coefficients, then it decodes all the DFT coefficients successfully 
with high probability. 




Fig. 10: A 3D visualization of a bipartite graph from the ensemble CK^, n) corresponding to the less-sparse regime of 
5 = 2/3. Recall that for 5 = 2/3, {Vj}'j^Q are pairwise co-prime integers, such that Vi ~ F, i = 0,1,2, where F is 
a large integer. The set T = {/o,/i,/2}, where /o = VqVi, fi = V1P2 and /2 = V2V0. The parameters /c ex F^ and 
n = ni=o ^ ^ n'^^^- A variable node with an associated triplet (ro, ri, r2) is represented by a 'ball' at the position 

(ro, ri, r2). The /o check nodes in the stage- 1 of the bi-partite graph are represented by 'blue' squares and likewise the ones 
in /i are 'green' and the check nodes in stage /2 are 'red'. All the neighboring check nodes of a variable node, e.g., 61, 
are multi-ton iff there is at least one more variable node along each of the three directions Rq , Ri and R2 . The green and 
red neighboring check nodes connected to the ball 62 are multi-tons, while the blue neighboring check node is a singleton 
since there are no other variable nodes along the R2 direction of 62- 
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3) Decoding all the variable nodes using the expansion properties of the CRT construction; 

Consider an alternative visualization of a random bi-partite graph from the ensemble C2 (J-', n) as shown 
in Fig. [Toj A variable node associated with a triplet (ro, ri, is represented by a ball at the position 
(^0,^1,^2)- The plane Rq-Ri corresponds to the check nodes in the stage /o, in a sense that all the 
variable nodes that have identical (ro, ri) but distinct r2 are connected to the check node (ro, ri) and so 
on. Similarly the planes R1-R2 and R2-R0 correspond to the check nodes in stages /i and /2 respectively. 
Thus, a variable node with co-ordinates (ro,ri,r2) is connected to multi-ton check nodes, if and only 
if there exist variable nodes with co-ordinates (ro,ri,r2), (ro,ri,r2) and (ro,r'i,r2) (see Fig. 10), i.e.. 



one neighbor in each axis. The FFAST decoder stops decoding if all the check nodes are multi-tons. 
Next, we find an upper bound on the probability of this 'bad' event. 

Consider a set S of variable nodes such that IS'I = 5, where 5 is a small constant. Let Es be an 
event that all the neighboring check nodes of all the variable nodes in the set S are multi-tons, i.e., the 
FFAST decoder fails to decode the set S. Also, let E be an event that there exists such a set. We first 
compute an upper bound on the probability of the event Es, and then apply a union bound over all (^) 
sets to get an upper bound on the probability of the event E. 

Each variable node in the set S chooses an integer triplet (ro,ri,r2) uniformly at random in a cube 
of size VoxVixV2' Let p^ax denote the maximum number of distinct values taken by these s variable 
nodes on any of the Ri axis. The FFAST decoder fails to decode the set S if and only if all the variable 
nodes have at least one neighbor along each of the 3 directions Rq.Ri, R2 (see Fig. [10]). This implies 
that s > 4pniax- Also, Pmax > 1, i-^-, s > 8, siucc by the CRT all the variable nodes s (with distinct 
associated integers) cannot have an identical triplet (ro,ri,r2). An upper bound on the probability of 
the event Es is then obtained as follows: 



-) 



3s 



(a) 
< 



(4f) 



3s 



s/4y 

F Y 

s/a) 



(10) 



se 



1/3 \ 9^4 



4F 



(11) 



where in (a) we used (P) < (pyql) < {pe/qf. Then, using a union bound over all possible (^) sets 



we get: 



Pr{E) < Pr{Es) 



< 



se 



1/3 \ 9^/4 



4F 

0(l/m), 



ke 
s 



(12) 



where in the last inequality, we used 5 > 8, oc and m = O(F^). 

Thus, the FFAST decoder decodes all the variable nodes with probability at least 1 — 0(l/m). 



B. Sketch of proof for 5 = 1 — 1 /rf, where d > 3 is an integer 

Consider a rf- stage FFAST architecture with the following parameters. Let {Vi}% 



pairwise co-prime integers such that Vi = F + 0(1) 



0,1,. 



be a set of 
1, for some large integer F. Set 



-1 

:0 ' 
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(r'o,r'i,r2,r3) 



(ri,r3) 



□ 
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□ 





□ 




□ 
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- 7^2 



/2 = 7^27^3 



Fig. 11: Let \Vi\\^^ be a set of pairwise co-prime integers, such that 7^^ ^ F, i = 0, 1, 2, and 7^3 ^ F^ for some a > 0. 
Also, let k oc F^+^, n = fli^o /o ~ ^o^^s, /i = T^iT^s and /2 = V^Pz- The check nodes in each of the 3 sets are 

arranged so that a j^^ check node in the set i belongs to the row (^2)vi and the column {^j^v-i- A variable node with an 
associated integer v is uniquely represented by a quadruplet (ro, ri, r2, ra) where Ti = {v)'p.^ i = 0, 1, 2, 3, and is connected 
to the check node (vi^rs) in set i. Thus, the resulting bipartite graph is a union of V3 disjoint bi-partite graphs as shown 
in the figure. 



k oc F^-i and n = flto i-^- ^ ^ n^^-^)/^. Also, let /, = flf f "^^ ^(j)^' for i = 0, . . . , - 1. The 
expansion property, of this construction for any value of d can be shown as follows: 

• Decoding all the variable nodes: For = 3, the worst case event is, the FFAST decoder fails to 
decode a set of size IS'I = 2^ = 8. For a general rf, using an induction, one can show that the 
worst case failure event is when the FFAST decoder fails to decode a set of size \S\ = 2^. The 
probability of this event is upper bounded by 1/F^ 

C. Achieving the intermediate values of 5 

In this section we show how to extend the scheme in Section |v| that was designed for k oc n^/^, to 
achieve a sparsity regime of oc 72(i+«)/(3+a) a > 0. This extension technique can be essentially 
used in conjunction with any of the operating points described earlier. Thus covering the full range of 
sparsity index < 5 < 1. 

Choose a set of integer factors {T^ijf^o ^hat are pairwise co-prime and 7^^ = F + 0(1), i = 0, 1, 2, 
while = + 0(1) for some a > 0. Let oc Fi+^ and n = ULo^i^ i-^- ^ ^ ^(i+a)/(3+a)^ 
fo = V0V3, fi = V1V3 and /2 = V2V3. 

1 ) Union of Disjoint problems: The check nodes in each of the 3 sets are arranged so that a 



check node in the set i, belongs to the row and the column {j)v3 (see Fig. 11). This is always 
possible using the CRT since {Vi}^=o are pairwise co-prime and fi = ViV3, i = 0, 1, 2. 

A variable node with an associated integer v is uniquely represented by a quadruplet (ro, ri, r2, ra) 
where = {v)p., i = 0, 1, 2, 3 and is connected to the check node (r^, ra) in set i. Thus, the resulting 
bipartite graph is a union of V3 disjoint bi-partite graphs, where each bi-partite subgraph behaves as an 
instance of the 3-stage perfect co-prime case discussed in Section [V} Then, using a union bound over 
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these disjoint graphs one can compute the probabihty of the FFAST decoder successfully decoding all 
but o{k) and all the variable nodes for asymptotic values of k,n. 



VII. Sample and computational complexity of the FFAST algorithm 
The FFAST algorithm performs the following operations in order to compute the n point DFT of an 



n-point discrete-time signal x (see Algorithm [T] in Section [TV] ) 
1) Sub-sampling: A FFAST architecture (see Fig.|2]in Section |l]) with d stages, has d distinct subsam- 
pling patterns chosen as per the discussions in Sections [V] and [Vl} These patterns are deterministic 
and are pre-computed. We assume the presence of random-access-memory, with unit cost per I/O 
operation, for reading the subsamples. For the very-sparse regime (k oc n^, < 5 < 1/3) as 
shown in Section [V} the FFAST architecture has d = 3 stages and D = 2 sub-streams per stage. 
Hence, the total number of samples used for the very-sparse regime is M = 2.44/c (see Section [V] 



Table IV). For the less-sparse regime, choice of the FFAST architecture parameters is bit more 



involved and depends on the sparsity index 5. A d < 8 stage FFAST architecture, in conjunction 



with discussion of Section VI-C, is sufficient to achieve the sparsity index of 1/3 < 5 < 0.99. 



Thus, again using the values from Table |IV] of Section [V] for = 8, for the less-sparse regime of 



1/3 < 5 < 0.99 the number of samples M < 3.74/c. In general for any fixed < 5 < 1, the sample 
complexity M can be as small as r/c, where r is a constant that depends on the sparsity index 5. 

2) DFT: The FFAST algorithm then computes 2d DFT's, each of length approximately equal to k 
(for an order optimal M = rk sample complexity), of the subsampled input data corresponding to 
D = 2 sub-streams in each of the d stages. Using a standard FFT algorithm, e.g., prime-factor FFT 
or Winograd FFT algorithm (iF], one can compute each of these DFT's in 0{k log k) computations. 
Thus, the total computational cost of this step is 0{klogk). 

3) Peeling-decoder over sparse graph codes: It is well known [34], that the computational complexity 
of the peeling-decoder over sparse graph codes is linear in the dimension of the graph, i.e., 0{k). 

Thus, the FFAST algorithm computes a fc-sparse n-point DFT with 0{k) samples using no more than 
0{klogk) computations for all < 5 < 1. 



VIII. Simulation Results 

In this section we validate the empirical performance of our FFAST algorithm for a wide variety of 
1-D and 2-D DFT settings for signals having an exactly or approximately sparse Fourier spectrum. In 



Section |VIII-A| we show the performance of the FFAST algorithm for the 1-D exactly- sparse setting, 
for two regimes, what we call the very-sparse and the less-sparse regimes. We contrast the observed 
empirical performance, in terms of sample complexity, computational complexity and probability of 



success, with the theoretical claims of Theorem |2} Section [VIII-B| provides an empirical evidence of the 



noise robustness of the FFAST algorithm for the 1-D setting when the Fourier spectrum of a desired 



signal is approximately sparse. In Section VIII-C[ we show, how the 1-D FFAST algorithm described in 



this paper can be generalized to a 2-D setting and also provide simulation results, of application of the 
FFAST algorithm, for an exactly and approximately sparse 2-D signals. In the 2-D simulation results 
provided in Section |VIII-C[ the amplitude of the non-zero DFT coefficients are complex valued and not 
limited to the scalar multiples of {±1}, as is the case for 1-D simulations. 



A. Performance of the FFAST algorithm for exactly k-sparse 1-D DFT signals 

• Very sparse regime: The desired signal x is of ambient dimension n = 511 x 512 x 513 ^ 134 x 10^. 
The sparsity parameter k is varied from 400 to 1300 which corresponds to the very- sparse regime 
of oc n^/^. We consider a FFAST architecture with d = 3 stages and D = 2 sub-streams per 
stage, i.e., two delay chains per stage (see Fig. |2]). The uniform sampling periods for the 3 stages 
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Fig. 12: The probability of success of the FFAST algorithm as a function of r], the average number of samples per sub- 
stream normalized by the number of non-zero coefficients k. The plot is obtained for the two different sparsity regimes: 1) 
Very-sparse regime, i.e., k ex n^/^. For this regime, sl d = 3 stage FFAST architecture is chosen. The number of samples 
per sub-stream in the three stages are perfectly co-prime, (/o = 511, /i = 512 and /2 = 513), and 2) Less-sparse regime, 
specifically (n^-^^ < k < n^-^^). For this regime, sl d = 4 stage FFAST architecture is chosen. The number of samples per 
sub-stream in the 4 stages are not co-prime, (/o = 5168, /i = 6783, /2 = 6384 and fs = 5712). Each point on the plot is 
obtained by averaging over 10000 runs. The ambient signal dimension n and the number of samples M are fixed, while 
the number of non-zero coefficients k is varied to get different values of rj. We note that the FFAST algorithm exhibits a 
threshold behavior with r]i = 0.427 being the threshold for the very-sparse regime, and r]2 = 0.354 for the less-sparse regime 
respectively. From Table IV in Section V-Cl we see that the optimal threshold values are r]^ = 0.4073 and r/^ = 0.3237, 
which are in close agreement with our simulation results. 



are 512 x 513, 511 x 513 and 511 x 512 respectively. This results in the number of samples 
per sub-stream, for the three stages to be /o = 511, /i = 512 and /2 = 513 respectively. Note 
that the number of samples per sub-stream in all the three different stages, i.e., //s, are pairwise 
co-prime. The total number of samples used by the FFAST algorithm for this simulation i^ 
M < 2(/o + A + A) = 3072. 

• Less sparse regime: The desired signal x is of ambient dimension n = 16 x 17 x 19 x 21 0. 1 x 10^. 
The sparsity parameter k is varied from 5000 to 19000 which corresponds to the less-sparse regime 
of n^-^^ < k < n^-^^. We consider a FFAST architecture with = 4 stages and D = 2 sub-streams 
per stage. The uniform sampling periods for the 4 stages are 21, 16, 17 and 19 respectively. This 
results in the number of samples per sub-stream, for the four stages to be /o = 16 x 17 x 19 = 5168, 
/i = 17 X 19 X 21 = 6783, A = 19 x 21 x 16 = 6384 and /a = 21 x 16 x 17 = 5712 respectively. 
Note that the number of samples per sub-stream in the four stages are composed of "cyclically- 
shifted" co-prime numbers and are not pairwise co-prime. The total number of samples used by 
the FFAST algorithm for this simulation is M < 2(/o + /i + A + fs) = 48094. 

• For each run, an n-dimensional fc-sparse signal X is generated with non-zero values Xi G {±10} 
and the positions of the non-zero coefficients are chosen uniformly at random in{0,l...,n— 1}. 
The time domain signal x is then generated from X using an IDFT operation. This discrete-time 

^^The samples used by the different sub-streams in the three different stages overlap, e.g., x[0] is common to all the zero delay 
sub-streams in each stage. Hence, M < 2(/o + /i + /2) and not equal. 
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Sparsity Regimes 


Signal dimension 

n 


Sparsity 

k 


k = n\ 
S 


# of stages 
d 


M ^ 2dr]k 




Number oi lailures 


Runs 


M 


T] 


Very- sparse 


511*512*513 
~ 1 Q/i \/ 1 


900 


0.363 


3 


3072 


0.569 


6 


1 


10000 


1000 


0.369 


3072 


0.512 


9 





10000 


1100 


0.374 


3072 


0.465 


13 


1 


10000 


1200 


0.378 


3072 


0.427 


18 


99 


10000 


Less-sparse 


16*17*19*21 
^ 0.1 X 10^ 


13000 


0.81 


4 


48094 


0.462 


6 





10000 


15000 


0.83 


48094 


0.401 


8 





10000 


17000 


0.84 


48094 


0.354 


13 


2 


10000 


19000 


0.85 


48094 


0.316 


29 


10000 


10000 



TABLE V: Shows the performance of the FFAST algorithm for two different sparsity regimes: 1) Very-sparse regime: 
k (X m}^^. For this regime, a = 3 stage FFAST architecture is chosen. The number of samples per sub-stream in 
each of the 3 stages are perfectly co-prime: /o = 511, /i = 512 and /2 = 513 respectively, and 2) Less-sparse regime: 
^0.73 <^ <^ n^-^^. For this regime, a d = 4 stage FFAST architecture is chosen. The number of samples per sub-stream in 
each of the 4 stages are not co-prime but have "cyclically- shifted" overlapping co-prime factors: /o = 16 x 17 x 19 = 5168, 
/i = 17 X 19 X 21 = 6783, = 19 x 21 x 16 = 6384 and /a = 21 x 16 x 17 = 5712 respectively. We note that the FFAST 
algorithm exhibits a threshold behavior in terms of sample complexity M. For example, when r]i > 0.427 and r]2 > 0.354, 
for the very-sparse and the less-sparse regimes respectively, the FFAST algorithm successfully computes all the non-zero 
DFT coefficients for almost all the runs. Further, in one or two instances when it failed to recover all the non-zero DFT 
coefficients, it has recovered all but 8 (for d = 3) or 16 (for d = 4) non-zero DFT coefficients. This validates our theoretical 
findings of the bottleneck failure event being that the FFAST decoder decodes all but a handful of the DFT coefficients. 



From Table IV in Section V-Cl we see that the optimal threshold values for the very-sparse and less-sparse regimes are 
r]l = 0.4073 and = 0.3237 resp., which are in close agreement with the simulation results. The table also shows that the 
average number of iterations £, required for the FFAST algorithm to successfully compute the DFT for both the regimes, 
are quite modest. 



signal X is then given as an input to our FFAST algorithm. 

• Each sample point in the Fig. [12} is generated by averaging over 10000 runs. 

• Decoding is successful if all the DFT coefficients are recovered perfectly. 

We observe the following aspects of the simulations in detail and contrast it with the claims of 
Theorem [2l 

Density Evolution: The density evolution recursion (|6]) in Section |V-C1[ implies a threshold 
behavior: if the average number of samples per sub-stream normalized by k, i.e., ry, is above a certain 
threshold (as specified in Section |V-C1| Table [TV] ), then the recursion ([6]) converges to as ^ ^ oc 
otherwise p£ is strictly bounded away from 0. In Fig. [12] we plot the probability of success, averaged 
over 10000 runs, of the FFAST algorithm as a function of ry, i.e., varying k for a fixed number of 
samples M. We note that the FFAST algorithm exhibits a threshold behavior with r]i = 0.427 being the 
threshold for the very-sparse regime with = 3, and r]2 = 0.354 for the less-sparse regime with d = 4 
respectively. From Table [IV] in Section [V-C1[ we see that the optimal threshold values are r]l = 0.4073 



and r/2 = 0.3237, which are in close agreement with our simulation results of Fig. 12 



The FFAST algorithm is verified to compute the DFT with high probability using as few as M 
samples, where the oversampling ratio r ^ 2dr] < 4 as given in Table [IV| 



rk 



Iterations: In the theoretical analysis of Section |V-C2[ we showed that the FFAST algorithm, if 
successful, decodes all the DFT coefficients in £ iterations, where, ^ is a large but a fixed constant. In 
order to get an empirical sense of how large £ has to be, we consider an example with n = 504, k = 30, 
and a three-stage FFAST architecture. The number of samples per sub-stream in the three stages are 
/o = 56, /i = 72 and /2 = 63 respectively. Further, each stage has D = 2 delay sub streams. An example 
of an n-length input signal x that has a 30-sparse DFT is shown in Fig. |13(a)[ In this simulation, we 
observ e that the FFAST algorithm computes the sparse DFT X perfectly in ^ = 2 iterations, as shown 
in Fig 13(c) using M = 2 ^^^q fi = 382 samples of x. Note that the number of samples M > 4/c for 
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(a) Original /c-sparse n-length DFT X, where k — 30 and 
n 504. 
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(b) Estimated signal X after 1 iteration. 
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(c) Estimated signal X = X after 2 iterations. 



Fig. 13: Consider a FFAST architecture with d = 3 stages, ambient signal dimension n = 504 and sparsity k = 30. The 
number of samples per sub-stream in the three stages are /o = 56, /i = 72 and /2 = 63 respectively. Further, each stage 
has D = 2, delay sub-streams. Thus, the total number of samples M = 2 Xli=o ~ ^^^^ ^^^^ number of samples 
M > 4/c for this example. This is done intentionally for the illustrative purpose and to avoid the clutter in the figure. The 
figure shows a) the original 30-sparse spectrum X. b) the spectrum estimated by the FFAST algorithm after 1 iteration. 
Note that it has correctly recovered 29 out of the 30 non-zero DFT coefficients, c) The FFAST algorithm recovers all the 
30 non-zero DFT coefficients perfectly after 2 iterations. 



this example. This is done intentionally for the illustrative purpose and to avoid the clutter in Fig. [13 
Table |V] shows more empirical values of £ for different settings of the problem. 

Probability of success: The empirical probability of success of the FFAST algorithm for the 
very-sparse as well as the less-sparse regimes is shown in Table [V| We see that as long as the parameter 
T] is above the minimum threshold ry* dictated by Table |IV} in Section |V-C1[ the FFAST algorithm 
indeed computes all the non-zero DFT coefficients successfully with high probability. Further, in one 
or two instances when it failed to recover all the non-zero DFT coefficients, it has recovered all but 8 
(for d = 3) or 16 (for d = 4) non-zero DFT coefficients. Thus, confirming our theoretical guarantees of 
recovering all but a vanishingly small fraction of the non-zero DFT coefficients with probability going 
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Fig. 14: Performance of the FFAST algorithm for computing the DFT of a signal x that has an approximately sparse DPT. 
The approximately sparse DFT X = X' + Z, where X' is exactly /c-sparse and Z is an i.i.d zero mean unit variance Gaussian 
random vector. The non-zero coefficients of X' take values in the set {±^/p}, where the signal to noise ratio (SNR) is 
defined as SNR= E{| |X'| ||/| |Z| ||} = kp/n. The FFAST architecture consists of d = 3 stages, where n = 26970 and 
k = 900. The number of samples per sub-stream for each of the three stages in the FFAST front end architecture are set to 
/o = 870, /i = 930 and /2 = 899 respectively. Further, each stage has D sub-streams, where we choose I) = 3, . . . , 8. Thus, 
the total number of samples used by the FFAST algorithm are no more than M = D Yli=o fi- The decoding is successful 
if all the DFT coefficients are recovered perfectly. Each sample point in the plot is generated by averaging over 1000 runs. 
For example, when D = 5, the number of samples used by the FFAST algorithm are M = 13495, and it recovers all the 
DFT coefficients perfectly, with high probability, for all values of SNR > 18dB. 



to one exponentially in k. 

B. Performance of the FFAST algorithm for approximately k-sparse 1-D DFT signals 

Although the theoretical claims in this paper are for exactly fc-sparse signal x, the FFAST algorithm 
is robust to the tail noise, i.e., approximately sparse signals, as well as to noisy observations. In this 
section we demonstrate the noise robustness of the FFAST algorithm empirically. 

• The signal x is of ambient dimension n = 29 x 30 x31 = 26970, and the number of non-zero 
DFT coefficients k = 900. The FFAST architecture used for this simulation has d = 3 stages. 
The uniform sampling periods for the 3 stages are 31, 29, and 30. This results in the number of 
samples per sub-stream in the three stages to be /o = 29 x 30 = 870, /i = 30 x 31 = 930, and 
/2 = 31 X 29 = 899, respectively. Further, in each stage there are D sub-streams of the sub-sampled 
input data, where = 3, . . . , 8. Note that, unlike the exact-sparse case, the number of sub-streams 
per stage for the approximately sparse setting is now greater than two. This results in increased 
sample complexity, M = D Yl'i=o fi- The additional samples are used for noise-robustness. 

• For each run, an n-dimensional exactly fc-sparse signal X' is generated with non-zero values X- G 
{±y/p}, where p determines the SNR, and the positions of the non-zero coefficients are chosen to 
be uniformly at random in {0, . . . , n — 1}. An n-length random vector Z consisting of i.i.d A/'(0, 1) 
is generated and added to the exactly sparse to get an approximately sparse signal X = X' + Z. 
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The time domain signal x is then obtained by taking the IDFT of X. This discrete-time signal x 
is then given as an input to our FFAST algorithm. 

Signal to noise ratio (SNR) is defined as SNR= E{||X'||2/||Z||2} = kp/n. 
Each sample point in Fig. [14] is generated by averaging over 1000 runs. 
Decoding is successful if all the DFT coefficients are recovered perfectly. 



The performance of the FFAST algorithm for the Gaussian tail noise model is shown in Fig. 14 
Note, for a fixed number of samples M, the FFAST algorithm shows a threshold behavior, i.e., there 
is a threshold SNR above which the FFAST algorithm recovers all the DFT coefficients perfectly. The 
threshold SNR after which the FFAST algorithm successfully decodes all the DFT coefficients decreases 
with increasing number of the samples as one would expect. While our analysis for the exactly sparse 
case requires M < 4/c samples for reconstruction, in the noisy case of approximately sparse signals, we 
end up taking more samples that in general depend on both k and n and also on the SNR and reliability 
requirements. An exact characterization of the increase in sample complexity as a function of SNR and 
reliability guarantees is part of our future work. 

C. Empirical performance of the FFAST algorithm for 2-D DFT signals 

The 1-D FFAST architecture proposed in this paper can be generalized to a multi-dimensional 
DFT with similar sparsity guarantees for the exactly sparse case. As an example, in this section we 
demonstrate an application of the FFAST algorithm for 2-D images. One possible way to extend the 
FFAST architecture of Fig. [2] to 2-D signals is to consider a 2-D signal as a matrix of data, and 
subsample the data separably along the rows and the columns using a small set of uniform sampling 
pattern guided by the CRT. Specifically, we show here an application of the FFAST algorithm for a 
differential image in an Magnetic Resonance Imaging (MRI) for both the exactly fc-sparse as well for 
an approximately fc-sparse images. In MRI, the samples are acquired in the 2-D Fourier domain, and 



the difference images are sparse in the spatial domain. For example, the image shown in Fig. |15(b) 
is a difference image between two consecutive axial slices of brain image and is exactly sparse in the 
spatial domain. The FFAST algorithm acquires the Fourier samples from the spectral domain of the 



differential image shown in Fig. 15(a) and reconstructs the estimated image as shown in Fig. 15(c) 



We provide simulation results for the 3 different types of images with parameters as specified below. 



Brain Image: The brain image shown in Fig. 15(b) is a difference between two consecutive axial 



slices of brain image and is exactly sparse in the spatial domain. The original image is 195 x 308 
pixels, i.e., n = 60060. The FFAST algorithm samples the image in the 2-D Fourier domain, see 
Fig. |15(a)[ using a 3-stage FFAST architecture to get M = 8910 samples. The number of non-zero 



coefficients are k = 3250. The reconstructed image by the FFAST algorithm is shown in Fig. |15(c) 
which is identical to the original image 



Shepp-Logan Phantom Image: The Shepp-Logan Phantom image shown in Fig. 15(e) is also of 



size 195 X 308 pixels, i.e., n = 60060, with exactly k = 812 non-zero pixels. The FFAST algorithm 



perfectly reconstructs the image as shown in Fig. 15(f) using M = 8910 samples. 



^CaF Image: The 'Cal' image shown in Fig. 15(h) is a synthetic image of size 280 x 280 pixels, i.e. 



n = 78400. The spatial domain of the 'cal' image is approximately sparse with k = 3509 significant 
pixels and all other pixels take values from Gaussian CJV{0, a^). The effective SNR is 16dB. The 
FFAST algorithm reconstructs the image shown in Fig. |15(i)| using as few as M = 10.06/c = 35308 
Fourier samples. The reconstruction NMSE = 0.0085. 
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(a) 2-D Fourier transform of a differ- 
ence image between two consecutive 
axial slices of Brain image. 



(b) Exactly k — 3250 sparse n = 
195 X 308 differential Brain image. 



(c) Perfectly reconstructed image us- 
ing the FFAST algorithm from M = 
2.74k = 8910 samples. 




(d) 2-D Fourier transform of a Shepp- (e) Exactly k = 812 sparse n — (f) Perfectly reconstructed image us- 
Logan Phantom Image. 195 x 308 Shepp-Logan Phantom spa- ing the FFAST algorithm from M = 

tial image. 10.97fc = 8910 samples. 






(g) 2-D Fourier transform of a (h) Approximately k = 3509 (i) Reconstructed image using the 

noisy 'Cal' image. sparse (i.e. noisy) n = 280 x FFAST algorithm from M = 

280 'Cal' image. An exactly k 10.06fc = 35308 samples. 

sparse 2-D spatial 'Cal' image 

is corrupted with 2-D i.i.d Gaussian 

noise Z, with an effective SNR 

of 16dB, to obtain a noisy spatial 

domain 'Cal' image X = X' + Z. 



Fig. 15: The figure shows the performance of the FFAST algorithm on the exactly- sparse differential brain image, the 
'Shepp-Logan Phantom' image and an approximately sparse 'Cal' image. The Brain and Shepp-Logan phantom images are 
of size 195 x 308 pixels, i.e., n = 60060, with different levels of sparsity. The 'Cal' image is 280 x 280 and has sparsity 
k = 3509 in addition to the Gaussian tail noise of an effective SNR of 16dB. 



Appendix 

A. Edge degree-distribution polynomial for bins-and-balls model 

The edge-degree distribution polynomial of a bi-partite graph is defined as p{a) = ^^iPi<^^"^, 
where pi denotes the probability of an edge (or fraction of edges) of the graph is connected to a check 
node of degree i. Recall that in the randomized construction based on a balls-and-bins model described 
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in Section |V-A[ every variable node chooses one neighboring node in each of the d subsets of check 
nodes uniformly at random. Thus, the number of edges connected to a check node, in the subset with 
/o check nodes, is a binomial B{l/fo,k) random variable. So when k is large, the binomial distribution 
B{l/fo,k) is well approximated by a Poisson random variable with mean l/ij. Thus, 

Pr (check node has edge degree = i) ^ — — — . (13) 

il 

Let po,i be the fraction of the edges, that are connected to a check node of degree i in set 0. Then, we 
have, 

Po,i = (check node has edge degree = i) 

rv 

(a) ifoil/rjYe-'/^ 

(14) 



k i\ 
(b) (l/?7)*-ie-i/^ 



where (a) follows from Poisson approximation of the Binomial random variable and (6) from /o = 
r]k + 0(1). Since, fi = r]k + 0(1) for alH = 0, 1 . . . , d - 1. We have, 

~ (^-l)! ^^^^ 

and 

p{a) = e-(i-^)/^ (16) 



B. Proof of Lemma ^ 

Proof: We provide a proof using a contradiction argument. If possible let S be the set of the 
variable nodes that the peeling-decoder fails to decode. We have \S\ < ak. Without loss of generality 
let \Ni{S)\ > \N,{S)\, G {0, . . . , - 1}. Then, by the hypothesis of the Theorem \Ni{S)\ > \S\/2. 

Note that the peeling-decoder fails to decode the set S if and only if there are no more singleton 
check nodes in the neighborhood of S and in particular in Ni{S). For all the check nodes in Ni{S) 
to be multi-ton, the number of edges connecting to the check nodes in set Ni{S) have to be at least 
2|A^i(S')| > 15^1. This is a contradiction since there are only \S\ edges going from set S to Ni{S) by 
construction. ■ 



C Proof of Lemma ^ 

Proof: Consider a set S of variable (left) nodes in a random graph from the ensemble Cf(J', m), 
where | J-"! = > 3. Let Ni{S) be the right neighborhood of the set S in the i^^ subset of check nodes, 
for i = 0, 1, . . . , (i — L Also, let Es denote the event that the all the d right neighborhoods of S are of 
size \S\/2 or less, i.e., max{|A^^(S')|}f~Q < \S\/2. First, we compute an upper bound on the probability 



32 



of the event Eq as follows: 



Pr{Es 



< 



(a) 



< 



< 



i=0 



\s\\ 

2F J 

\S\\ 
2F J 

\S\e 
2F 



2/ J 

d\S\ 



\s\ 



( 



d\S\ 



F 

\\S\/2 
2Fe 



fi 
\S\/2 

d 



\S\ 



d\S\/2 



d\S\/2 



(17) 



where the approximation (a) uses fi = F + 0(1) for all i = 0, . . . , d — 1. Next, using a union bound, 
over all possible sets of size \S\, we get an upper bound on the probability of an event E, that there 
exists some set of variable nodes of size l^"], whose all the d right neighborhoods are of size \S\/2 or 
less. 



Pr{E) < Pr{Es) 



k 

\S\ 



< 



(b) 
< 



2F 

M 

F 



d-2 , / \ 2 



\S\/2 



< 0{{\S\/mp/^) (18) 

where in (6) we used F = r]k and in the last inequality we have used d > 3 and m = 0(F). Then, 
specializing the bound in ([18]) for IS'I = a/c, for small enough a > 0, and \S\ = o{k) wt get, 
• For \S\ = ak, for small enough a > 0: 

Pr{E) < e-^^^^s(^/^\ for some 6>0 (19) 

. For \S\ =o{k): 

Pr{E) < 0{l/m) (20) 



D. Proof of Lemma ^ 

Proof: a) [Expected behavior] Consider decoding on a random graph in the ensemble Cl{T,m). 
Let Zi, i = 0, . . . , fcrf — 1, be an indicator random variable that takes value 1 if the edge is not 
decoded after £ iterations of the peeling-decoder and otherwise. Then by symmetry K[Z] = kdE[Zi]. 
Next, we compute E[Zi] as, 

E[Zi] = E[Zi I A/|/ is tree-like] Pr(A/|/ is tree-like) 

+E[Zi I A/|f is not tree-like] Pr(A/|f is not tree-like) 
< E[Zi\ Ml[ is tree-like] + Pr{Ml[ is not tree-like). 
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In Appendix |e| we show that Pr{J\fg^ is not tree-hke) < 7/F for some positive constant 7. Also we 
have E[Zi | A/J^^is tree-like] = by definition. Thus, 

E[Z] < kdpi + —, (21) 

where we have used F = r]k. 

b) [Concentration] In this part, we want to show that the number of edges Z, that are not decoded 
at the end of £ iterations of the peeling-decoder, is highly concentrated around kdpi. There are two 
potential ways of an edge not being decoded after i iterations: 1) the 2£ depth neighborhood of the 
edge is not tree-like or 2) the neighborhood is tree-like but the edge belongs to the fraction of edges 
that are not decoded after £ rounds (as per density evolution equation (|6])). Let T be the number of 
edges, out of the total of kd, that have tree-like neighborhood of depth 2£. Let V C T, be the number 
of edges not decoded despite having tree-like neighborhood. Then from Appendix |E] we know, 

E[r] > kd{l-e/4) (22) 

for any e > 0. Moreover, from equation ([6]) 

E[r] = Tp,. (23) 

Next, we obtain a concentration result for T, by using the (now) standard argument of exposing the 
k variable nodes vj, j = 1, . . . fc, (and hence the edges in the graph) one by one to set up a Doob's 
martingale and applying Azuma's inequality. In particular, let Yi = E[r|^^] be the expected value of 
T after exposing i variable nodes. Then, Yq = E[r] and Yk = T and the sequence Yi forms a Doob 
martingale. 

Now, consider any pair of graphs (determined by the choice of variable nodes) that differ only 
on {i + ly^ variable node. The number of edges with tree-like neighborhood of depth 2£ in these 
graphs differ at most by a constant number, since the differing variable node can influence the 2£ depth 
neighborhood of only constant number of edges (since the left edge degree is a constant). Hence, 
\Yi^i — Yi\< Ci, for some constant q V i = 0, . . . , — L Hence, using Azuma's inequality along with 
([22]), we get, 

Pr {kd-T> ekd/2) < exp{-/3ie^k), (24) 

for some constant Pi that depends on the left degree d and constant 77. Using Azuma's inequality the 
following concentration result for V holds, 

Pr {\r -Tp^\> ekd/2) < 2exp{-f32e^k). (25) 

The assertion then follows from ([24]), ([25]) and r < Z < r + \kd - T\. ■ 



E. Probability of Tree-like Neighborhood 

Consider an edge e in a randomly chosen graph Q G Ci{J', m). Next, we show that the neighborhood 
A/|^* of e is tree-like with high probability, for any fixed t. Towards that end, we first assume that 
the neighborhood A/|^ of depth 2£, where ^ < is tree-like and show that it remains to be tree-like 
when extended to depth 2£+l w.h.p. Let C^^i be the number of check nodes, from set i, and be the 
number of variable nodes, present in A/|^. Also assume that t more edges from the leaf variable nodes 
in A/|^ to the check nodes at depth 2^ + 1 are revealed without creating a loop. Then, the probability 
that the next revealed edge from a leaf variable node to a check node (say in set i) does not create 
a loop is > 1 — T^T^' Thus, the probability that A/'^^^^ is tree-like, given JV^^ is tree-like, 

is lower bounded by min^(l — )Q+i,i-Q,i^ Similarly assume that M^^^^ is tree-like and s more 

edges from check nodes to the variable nodes at depth 2^ + 2 are revealed without creating a loop. 
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Note the probability that a check node has degree > 2 is upper bounded by fc/F and conditioned on 
the event that a check node has an outgoing edge it has equal chance of connecting to any of the edges 
of the variable nodes that are not yet connected to any check node. Thus, the probability of revealing 
a loop creating edge from a check node to a variable node at depth 2^ + 2 is upper bounded by, 
(fc/F) (l - ^E^S^) < F(S^- Thus, the probability that N^^^^ is tree-like given A/?^+^ is tree-like 
is lower bounded by (1 - - 

It now follows that the probability that A/"!^* is tree-like is lower bounded by 

min 1 1 

^ \ F{k-M,.)J \ f,-C,. 

Hence, for k sufficiently large and fixed 

Pr (A/'2^*not tree-like) < max — ^ + ^ < J 

for some constant 7 > 0, since for any fixed value of Ci*^i and M^* are constant w.h.p and fi 
F + 0(1) for alH = 0,...,d-l. 
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